9 – M7L5 09 Dispersion Intro Part 1 V2

Hi everyone. We’re going to calculate some market regime features, and we’re going to try to capture market wide regimes. Market wide means, we’ll look at the aggregate movement of the entire universe of stocks. So we’re calculating a single value for the entire universe of stocks. First one is high and low dispersion, and … Read more

8 – M7L5 9 V1

The term market regime is another way to describe marketing additions that broadly affect how stocks behave during a period of time. Market regimes change over time, so it’s helpful to generate features that capture changes in these market conditions. There are two market regime features that we will work with. One is market dispersion … Read more

7 – M7L5 07 Volatility Dollar Volume Solution V1

Okay. So let’s look at the solution to these quizzes. So the first quiz was, the annualized volatility window length is 252 by default, because it’s the one-year volatility. Try to adjust the call to the constructor of annualized volatility so that this represents one month volatility. Okay. So we can call the constructor annualized … Read more

6 – M7L5 07 Volatility Dollar Volume Part 3 V1

So next, here’s the Average Dollar Volume feature. So we’ve been using the Average Dollar Volume in the past to use that to choose the stock universe. So based off of the stocks that have the most dollar volume over a certain time period, we choose those stock tickers. We can also use this as … Read more

5 – M7L5 07 Volatility Dollar Volume Part 2 V1

So, here’s a quiz. We can see that the returns window length is two, because we are dealing with daily returns, which are calculated as the percent change from one day to the following day. So, it’s two days. The annualized volatility window length is 252 by default, because it’s the one-year volatility. So, that … Read more

4 – M7L5 07 Volatility Dollar Volume Part 1 V2

Hi everyone. We’re going to look at some Universal Quant Features. The first one is stock volatility. Zipline has a custom factor called AnnualizedVolatility and the source code is here. So, we can take a look here. You can see that we’re looking inside zipline, pipeline, factors, basic.py. If we search for AnnualizedVolatility, we can … Read more

31 – M7L5 28 V1

Here’s a preview of what we’ll learn next. You may have heard often how overfitting in machine learning is a problem that we wish to avoid. Over fitting with financial data is also a significant issue we have to deal with If we hope for our models to perform well and out-of-sample testing and in … Read more

30 – M7L5 Outro V1

Let’s review what we’ve learned in this lesson. Machine learning models trained on features and targets. Features can be more general than Alpha factors, in that they can also be used for splitting on other features or vectors. Two universal quantum features are stock volatility and stock dollar volume. Two regime features are market volatility … Read more

3 – M7L5 6 V1

Features are slightly more general than alpha factors. Recall that alpha factors are signals that are ideally predictive of whether future stock returns maybe positive or negative, and by how much. Features are similar, but they don’t have as high a requirement to be predictive on their own. A good feature is one that may … Read more

29 – M7L5 24 Targets Solution V2

So now, let’s look at the solution to calculate targets. So we created the pipeline. Here’s the example where we’re adding one target, which is converting the returns into 2-quantiles and adding to the pipeline. So here we’re going to do something quite similar. We’re going to create the returns. The window length is five … Read more

28 – M7L5 23 Targets Intro V1

Now, we’re going to create the targets, also called the labels. So we want the model to try to predict the go forward one week return. But instead of using the returns directly, we want to convert them into quantiles. The reason we want to do this is to make the target market neutral and … Read more

27 – M7L5 22 V1

Now, let’s look at the data that goes in the other end of the machine learning training process. These are the outputs that we want the model to learn. They’re often called labels or targets. I’ll refer to these as targets. If you think back to when we proposed and tested potential output factors, can … Read more

26 – M7L5 19 Dates Solution Part 2 V1

Now, we’ll do the quiz for getting the start and end of the quarter or month. So this was some sample code that we can take a look at. Okay. So the question is, create a feature that indicates the first business day of each month. So it’s very similar to how we do it … Read more

25 – M7L5 19 Dates Solution Part 1 V2

All right. So let’s check out the solution. So for the first question, it was asking us to create a column to say whether it’s January or not. So we’ll start with, and we can check out these examples here. We’ll start with our dataframe, reference the index, and from the index we’re going to … Read more

24 – M7L5 19 Dates Intro Part 4 V1

So how will we use this? Create a Datetimeindex that stores the dates which are the last business day of each month. So that’s what we just did above. We have the last business day of each month. We can use this function.isin and pass in these last days of the month to check if … Read more

23 – M7L5 19 Dates Intro Part 3 V1

So next, this is also going to look at the start and end of particular date ranges. So, the start and end of the week, month, and quarter may have structural differences in trading activity. So we’re going to use pandas.daterange here. So here’s the documentation for that. Okay. So it takes three parameters: start … Read more

22 – M7L5 19 Dates Intro Part 2 V1

Okay, so next here’s the quiz. Create a numpy array that has one, when the month is January and zero otherwise, and store it in a column in the all factors dataframe, and you’ll add another similar column to indicate when the month is December, right? Okay, once you try that out, you’ll do something … Read more

21 – M7L5 19 Dates Intro Part 1 V2

Okay. So now, we’re going to add date part features. So we will make features that might capture a trader investor behavior due to calendar anomalies. So we can get the dates from the index of the dataframe, so that’s the all factors dataframe that we’ve been working with. So, notice that we can use … Read more

20 – M7L5 18 V1

We’ve had dates in our data for a while. But, how do we use date information in a model? You can imagine trying to directly feed the raw data into the model and not getting anything useful out of it. Well, we can apply our domain knowledge to engineer useful date features. To motivate the … Read more

2 – M7L5 03 Setup Code Exercise V1

Hi, everyone. Welcome to lesson on Feature Engineering and Labeling. Here’s the notebook that we’ll use to practice coding for this entire lesson. So, I’ll just read this section for you, feature engineering and labeling. We’ll use the price volume data and generate features that we can feed into a model. Will use this notebook … Read more