5 – M2L2 05 Handling Outliers In Raw Data V3

So, what do you do when you find outliers and raw price data? Well, although at least large institutions, co- traders usually have a team of people to clean data for them. They may run into situations where they need results fast. The easiest and quickest way to determine if the extreme value is real, or fill in a missing data point is to cross check with another data source. If it’s a one off point, you can remove or replace it with data from a secondary source manually. You might think of replacing these data with some mean of the surrounding data. This is not done often because it risks incorporating information from the future into those days data. Inaccuracy due to the use of information that would not have been known or available during the period being analyzed is called Lookahead Bias. Think about it. If you were trying to fill in that missing datum by averaging data from the surrounding days on the day for which the datum is missing, you wouldn’t be able to because you wouldn’t know what the price would be in the future. Lookahead bias is a bias because using unknowable data from the future will consistently make your results look better. The main problem with using future data in signal research is that any kind of strategy based on that research would be impossible to execute. One common mistake is to use closing prices of the current day or a future date to calculate the trading signal for the same day. But of course, a strategy that says to make a trade during the day based on a closing price for that same day would be impossible to implement. It may not seem to make much of a difference in cases where you were using the data for some kind of historical analysis, but to be safe, it makes sense to simply substitute a missing closing price with the previous closing price during signal research. However, keep in mind that it is recommended to keep the missing data during back testing because they may represent a real non-tradable event. For example, stock A may not have traded for a day and thus it’s closing price would have been missing. In your back test your data should reflect this and your strategy should not attempt to place trades on stock a during that day.