4 – M2L2 04 Spotting Outliers In Raw Data V3

Finding and handling Outliers in raw price data and signal returns are slightly different scenarios. In this video, we’re going to talk about spotting Outliers in raw data. Outliers to look for in raw data include large changes in stock prices and volumes, missing dates, missing prices and missing volumes. One basic approach to finding extreme values in raw data is to screen the data for them. Doing this by brute force, by looking at every row, for example, is very inefficient but maybe necessary if time is tight. Plots can be helpful but not by much if the data-set contains hundreds of stocks. One way to screen for Outliers is to create rule-based searching and filtering methods. For example, you might set up a filter to catch instances when prices change by more than some value that seems reasonable given your signal and the scale of typical price movements. Percent change thresholds should not be relied upon too heavily, as they are likely to yield many false positives, extreme yet legitimate price movement data-points. Nonetheless, using such thresholds is one way to screen data quickly. If the price change is accompanied by a large change in volume, it’s less likely to be wrong. So, you can use volume information to improve the accuracy of your filter. The challenges of this task will always be the need to process large amounts of data, minimize false positives and decide how to deal with data values that are missing.