Survivorship Bias-Free Data
What is Survivorship Bias?
As defined on Wikipedia, "survivorship bias or survival bias is the logical error of concentrating on the people or things that made it past some selection process—and overlooking those that did not—typically because of their lack of visibility. This can lead to false conclusions in several different ways."
When we run a back test on equities, we often say that we want to focus our test on the members of a popular index like the S&P500, FTSE100 or ASX200. We collect together the securities that make up that index and do our test on those over the last ten years. The trouble is that members of the S&P500 today are not the same as the members ten years ago. Lehman Bros ring a bell? How can we run a test without including “LEH” in the list?
Similarly, there are a number of current names in the S&P that were not included ten years ago. E.g. Netflix was added to the S&P500 in 2010, so we should not be considering any signals in Netflix before 2010 when it was not part of the index.
Both these conditions - ignoring companies that are no longer in the index and including those that had not yet “made it” - leads to survivorship bias which skews tests positively.
How do I use this data in my tests?In the window of the test setup (whether it be a Back Test, Signal Test, or Trade Test) if the S&P500 is selected from our Symbol List as the universe under Codes to Scan then a Membership option appears, allowing you to select the Current or Historical membership.
If you select Historical then all the previous members of the index will be included in the test (since 2000 there have been around 890 members of the S&P500 index), whether they were in the index or not at any point in time, but we can go one step further and only include those companies that were in the index at the same time as the signal.
The IsMember() FilterIn our database we have added dates when stocks were members of the S&P500 index, and we can use that to filter in our tests by using a new scripting function called IsMember(). For any given day, this will give a value of 1 or 0 depending on if the company was a member of the index or not, so when the value is 1 (i.e. true) then it will be included in the test, and ignored if not.
You can see examples of this by opening a chart of First Solar from Nasdaq (FSLR) and add a Show View tool using IsMember() as the formula. As it's a US stock it will default to the S&P500 index, and it will then display the period then FSLR was in the index (from October 2009 to when it was removed in March 2017).
You will notice in the chart below that FSLR is still trading because it was replaced for market capitalisation reasons, whereas the second example of Whole Foods (WFM) the membership ends abruptly with the data as it was delisted when it was taken over by Amazon in August 2017 and replaced by IQVIA Holdings (if you add the IsMember Show View to a chart of IQV it will begin when WFM left).
When adding to a test you can use your existing formulas and add the IsMember() function as a second criteria by clicking the +:
Current vs Historical ResultsThe following is a simple signal test to enter a stock when the 50-period moving average crosses above a 200-period moving average over the last ten years on the current members. Here is the script:
MA(BARS=50) CrossesAbove MA(BARS=200)
Remember in this chart the blue shaded plot is our equity from the test. Obviously not a lot of alpha, but it shows a moderate return over the index (red line). The issue is that we have only used the current 505 stocks in our 10 year test. We need to set this up to include all the companies that were ever in the index by using the historical dataset, and then only take the trade if the IsMember() filter is true, by either adding a separate criteria to the setup as above, or adding it to the end of the formula:
MA(BARS=50) CrossesAbove MA(BARS=200) AND IsMember()
Suddenly this does not look so good anymore. Our idea did not “beat” the market at all. Anyone who has ever tried trading a MA crossover like this knows that it’s a great strategy in theory, but the results are really hard to replicate.
The main point of this is to highlight to you how important survivorship bias is, and to ensure that you don’t ignore it in testing. If you have ever been frustrated by your inability to repeat test results in real-life, then this will help you see why that has happened.
A simple rule of thumb, for when you don’t have access to correct survivorship bias-free data, is to subtract around 3% per annum from your results. That will give you a better idea of what you can expect. Just don’t plan your trading strategy by only looking at the survivors. Make sure you properly consider the securities that didn’t make it.
Last updated Thu, Jan 10 2019 6:10am