After you calculate returns from a trading signal, you may suspect the presence of outliers when you examine the distribution of your signal returns. Let me explain what I mean by that. Imagine that we’ve written some code to generate a trading signal and we’ve calculated our monthly returns from a couple years of trading based on this signal, what would we expect the distribution of returns to look like? Let’s figure out what we would expect to see in a few limiting scenarios. What if we picked our buy and sell times at random? In that case, we wouldn’t expect to make any money because we’ll be equally likely to buy or sell when the market is going up or down. We’d expect our returns to have a normal distribution with mean zero, but what if we designed a trading strategy that performed well? Ideally, if a trading signal looks like it will perform well, the distribution of its returns should look like a slightly positively skewed normal distribution. For the signal to make money, returns should be positive and non-zero on average. So, the distributions mean should be above zero. However, sometimes the return distribution can look a little too good or just plain weird. This should arouse your skepticism. Extremely skewed shapes or bumps at either tail of the histogram spell trouble. One tool you can use to compare your distribution of returns to another distribution like the normal distribution is the QQ plot. A QQ plot is a plot of the quantiles of the first data set against the quantiles of the second data set. What are quantiles? Well, if we split a data set into four equally sized groups, the dividing lines are at the 25th, 50th, and 75th percentiles, these are usually called quartiles. The 50th percentile is the median, the value below which 50% the data fall. Quartiles divide the data set into four groups, but with quantiles, the dataset can be divided into any number of equally-sized groups. For example, you could have ten quantiles, these are usually called deciles. The word quantile is usually used to denote the cut points, but it’s sometimes used to refer to the groups themselves. Let’s get back to the goal of comparing your distribution of returns to the normal distribution. If you wanted to use QQ plots, you would first first the set of quantiles you want to use, then you would plot the nth quantile of your distribution against the nth quantile of the normal distribution and continue for all the values of n. If you’re comparing your distribution to the normal distribution, and your distribution is approximately normal, points in the QQ plot should fall along a straight line. If the distribution has fatter tails than the normal distribution, the QQ plot will reveal deviations from a straight line at the extremities of the graph. Distributions with skew will also have QQ plots that curve away from a straight line. A good quant should try to understand the root cause of outliers and returns. The first step in dealing with a situation like this is to find out where and when the outlying data points occurred, for which stock or stocks and for which dates. The next step is to ask why. Depending upon the nature of the extreme data, the source might be obvious or less so. Was it a data error? Was an illegitimate movement due to a real event? If an extreme datum looks like it could be real, you can check the news for the stock on that day. Was there an announcement? You’ll want to think of every possibility you can to explain why this might not be a market data vendor problem. A good strategy is to crosscheck with another market data source.