# 1 – M2L3 01 Intro V4

This lesson focuses on two main concepts. 1. Checking if your data fits a normal distribution. 2. Checking if your data is stationary over time and what to do in both cases when they’re not. To understand regression which we will build upon later in time series analysis when using regression, we choose one or more independent variables to help us predict a dependent variable. Independent variables may include market indices, employment numbers, weather forecasts or consumer spending reports. Dependent variables include stock price returns, but may also include electricity consumption or the amount of corn harvested. You can also choose stocks returns as both your independent and dependent variables in a regression. At first glance it looks like you’re using the returns of one stock to predict the returns of another stock. In fact the regression is used to see how two assets or two groups of assets move in relation to each other. This is what we do in statistical arbitrage. Statistical arbitrage is a trading technique that involves simultaneously buying and selling two related assets based on the analysis of how these two assets move in relation to one another. It’s important to keep in mind the signal-to-noise ratio in our data. The signal is the meaningful part of our input data that helps us predict our dependent variable. The noise is the random part of our data that does not help us make better predictions. With financial data, the signal is relatively low and there is a lot of noise. In other words, the signal-to-noise ratio tends to be low. When the signal-to-noise ratio is low, predictive models tend to overfit the data that it’s trained on. This means that when the model is used to make forecasts in real life, they often pay too much attention to aspects of the data that are not actually useful in prediction. This results in predictions that are not accurate enough to be useful. Moreover the relationship between the independent variables and the stock returns can change over time. In other words, their relationships are not stationary in practice. This means that your models become stale after some time and need to be periodically retrained with more recent data. This also means that some independent variables that produced useful predictive signals in the past may not necessarily do so in the future and vice-versa. This is partly why a certain trading strategies fade after a while and become useful again in the future.