6 – M1L4 08 Missing Values V5

Up until now, you’ve been treating stock prices as a continuous time series. For instance, the end of day data for stock includes a row for every day or does it? Here’s how Facebook traded over the month of May in 2017. As you can see, there’s these clumps of five samples and then a gap and then another clump and so on. Take a look at the calendar for that month each clump of five dates in the plot corresponds to a workweek. These gaps are simply weekends when the market was closed. Two months later, it’s July and we see a similar pattern, but here at the beginning, there seems to be a date missing, its fourth of July a holiday in the United States. So, gaps in the data can result from weekends, holidays, and other reasons the market might be closed. You might be thinking why is this important? After all, if we forget about these missing days, the data is still continuous in terms of trading days. Well, that’s true if you treat the price data as simple sequence and ignore the timestamps, then you don’t need to worry about the gaps. Say, you’re computing daily returns. Take the price on each day and subtract it from the price on the previous day. Well, previous trading day that is. For more robust approach to trading, you may not want to ignore the missing days. Even if the market is closed, other events can occur that might influence stock prices when the market reopens. For example, company announcements, news articles, geopolitical events, natural disasters, anything and everything can affect the stock price. The more time between two trading days, the bigger the window for things to happen. So, you could try to normalize returns by dividing the actual number of days between any two samples. This may work in certain applications, but can reduce the genuine large differences, or you could just use that information about these irregular gaps between samples when trying to make trading decisions. Okay, let’s recap, weekends, holidays, and other events can cause market to be closed on certain dates. These dates may be missing in stock market data, you can choose to ignore these gaps, normalize for them, or identify and deal with them as needed. Another kind of gap you should be aware of is the time between the market closes for the day and when it reopens the next day. Markets often allow some additional trading during the pre and post market sessions. Few traders participate in these sessions, so the volume is low, but these transactions can still affect stock prices. Moreover, when a stock is listed on multiple exchanges around the globe, its price may be actually changing around the clock on another exchange. When a particular market opens for trading, the price of that stock can be different from the closing price on that market from the previous day. Again, depending on how you are using the price information, you may not need to worry about these differences, but they may give you an additional clue that you can use for trading. A more significant case of missing values is produced by a major corporate action like listings and mergers. For instance, say you’re analyzing stock data from the year 2000 to 2016, Google only IPOed in 2004, there is no existence of the stock prior to that. So, what do you do? If you absolutely need a value to work with, you can backfill Google’s opening price from its IPO date to the beginning of your analysis period using the same price for open, high, low, and close. Since no trading actually happened on these days, you can set volume to zero. But, this may not be necessary, and it can be misleading. So instead, you can maintain a list of valid ticker symbols that form the universe of stocks you’re considering and this list can change from day to day. A more bizarre case happens when a company is delisted from exchange. Perhaps because they went bankrupt or got bought out entirely by private investors. Dell went private in 2013 by buying back all its public shares. No record of Dell stock exists from that point onwards. If you held a share of Dell stock on that day it went private, it’s not like you’d lost that investment, you would have been paid by Dell for that share. So, it would be wrong to assume that the price dropped to zero. One way to mitigate this is to fill the last known price on the stock forward till the end of your analysis period, or if you’re simulating trades over that time period, you can force sell the stock on that date and remove Dell from the stock universe going forward. How you ultimately deal with this misleading values will depend on exactly what you’re trying to do with the data, but completely ignoring them, might not be the right choice.

%d 블로거가 이것을 좋아합니다: