Hello and welcome. In this lesson, we will introduce you to the zipline pipeline. Zipline is an open-source algorithmic trading simulator developed by Quantopian. In this notebook, we will see how to create a pipeline with screens, factors, and filters on how to run them using a pipeline engine. So, let’s get started. So, why do we need a pipeline? One reason is that on any given trading day, the entire universe of stocks consists of thousands of securities. Usually, you will not be interested in investing in all of them, but rather you’re most likely to only select a few of them to invest. For example, you may only want to invest in stocks that have a 10-day average closing price of $10 or less. In order to avoid spending a lot of time doing data wrangling to select only the stocks you want, people often use pipelines. So, in general, a pipeline is a placeholder for a series of data operations used to filter and rank data according to some factors. Before we start building our pipeline, we will first see how we can load the stock data we’re going to use into Zipline. Zipline uses data bundles to make it easy to use different data sources. In this notebook, we will be using stock data from quote media. In the Udacity workspace, you will find that the stock data from quote media has already been ingested into Zipline. So, we will only need to load the data. To load the data, we will use zipline’s load function from the bundles class. In order to load a previously ingested data bundle, we must pass the name of the data bundle to the load function. In this case, the name of our previously ingested data bundle is eod-quotemedia, which is specified here. The first thing the load function does is to look for the most recent ingested data. Therefore, we must also specify the location of the previously ingested data bundle. This is done by setting the zipline route variable to the path where the most recent data is located. Before we load the data bundle, we must also register the data bundle and its corresponding ingest function. After we load our data, we’re ready to build our first pipeline. We will now start building our pipeline step-by-step. We will start by building an empty pipeline with the screen. To build a pipeline, we will use Zipline’s pipeline class. In this example, we have used a screen that selects the top 10 assets with the highest average dollar volume within a 60-day window. This screen acts as a filter to exclude data from our stock universe every day. The average dollar volume is a good first pass filter to avoid illiquid assets. This way, we can guarantee that the selected assets have enough daily trading volume to fill our orders quickly. It is important to note that this freshly constructed pipeline is empty. This means that it doesn’t know yet how to compute anything, anyone produce any values is we ask for its output. The next step in building a pipeline is to add factors and filters. We will now take a look at two types of computations that can be expressed in a pipeline. Factors and filters. Let’s take a look at factors first. In general, a factor is a function from an asset at a particular moment in time to a numerical value. A simple example of a factor is a most recent price of a security, because the most recent price of a security is just a number, for example, $10. On the other hand. A filter is a function from an asset at a particular moment in time to a boolean value. Boolean values are either true or false. An example of a filter is a function indicating whether a security whose price is below $5. This is because at any particular moment in time, this statement will either be true or false. So, the difference between factors and filters is that factors return numerical values, while filters return boolean values. Zipline comes with some built-in factors and filters, but also allows you to combine and create custom factors and filters. Before we learn how to add factor some filters to our pipeline, let’s take a look at a nice feature of the pipeline class. A neat feature, a Zipline’s Pipeline class, that it comes with the attribute show graph that allows you to render the pipeline as diagram. This diagram is specified using the DOT language, and consequently, we need DAG graph layout program to view the rendered image. In this notebook, we will use the package Graphviz to render the diagram produced by this show graph attribute. Let’s take a look at the current diagram of our pipeline. Right now, our pipeline is empty, and it only contains a screen. Therefore, when we render our pipeline, we only see the diagram of the screen. We can see that our screen takes as input, the closing price and volume from the US equity pricing data set to calculate the average solar volume in a 60-day window. At the bottom of the diagram, we can see that the output is determined by the expression x_0 less than or equal to 10. This expression reflects the fact that we only selecting the top 10 assets. As we are factors and filters through our pipeline, this diagram will get more complicated. In this diagram, we saw that our screen takes as input prices and volume from the US equity pricing dataset. So, let’s take a moment to talk about the datasets and data loaders. Another feature of Zipline is that it separates the actual source of the stock data from the abstract description of that dataset. Therefore, Zipline differentiates between the actual dataset and the loader for that dataset. For example, the loader used for the USEquity pricing dataset is the USEquityPricingLoader. The USEquityPricingLoader class can also be used to load, open, high, low, close volume data from other data sets, like the one from quotemedia. Therefore, we will set the USEquityPricingLoader as our data loader. Before we add our factors and filters, let’s take a look at the raw data in our quotemedia data bundle. This requires a couple of steps. The first step is to build a pipeline engine. This is because, in order to execute a pipeline, Zipline employs pipeline engines. The SimplePipelineEngine class that we’ve used here associates a data loader with a trading calendar and a corresponding data bundle. It is important to note that the get loading parameter must be a callable function, and this is the reason we have defined this function right here, that’s [inaudible] are pricing loader. We will also use the trading calendar used by the New York Stock Exchange. Once we have chosen our pipeline engine, we’re ready to execute our pipeline. We can execute our pipeline by using the.run_pipeline attribute from this simple pipeline engine class. In this example, we will run our pipeline for a single day. We can see that the output of the pipeline is a Pandas DataFrame with a MultiIndex, where the first index level contains a trading dates, and the second index level contains the tickers for the stocks that have passed our screen. This tickers can be accessed and saved into a list. Once we have the tickers for the stocks that have passed or pipeline screen, we can get the historical stock data for those tickers from our data bundle. In order to get the historical data, we need to use Zipline’s data portal class. A data portal is an interface to all the data that a Zipline simulation needs. Once we created the data portal like we’ve done here, we can get the historical data by using the get_history_window attribute. Here, we can see the historical data for the given start and end dates. It is important to note that when a pipeline returns a date, for example, January 7th, 2011, this includes data that will only be known prior to the market opening on that date. Therefore, the price shown for January 7th, 2011 is actually the closing price from the day before. Finally, let’s see how we can add factors and filters to our pipeline. We can add both factors and filters to our pipeline using the add method from the pipeline class. The first parameter in the add method represents the factor or filter we’re going to add to our pipeline, and the second parameter is a string that determines the name in the column, in the output DataFrame for that factor or filter. Here, we have added factor that computes the 15-day mean closing price. Now, let’s render our pipeline to see what it looks like. We can clearly see our factor in the pipeline now. If we want our pipeline, we now see a column that contains the output of the factor, namely the 15-day mean closing price for each stock that passed our screen. Now, let’s add a filter to our pipeline. Here, we’ve created a filter that returns true whenever the 15-day average closing price is above $100. Like I mentioned earlier, we can add this filter to our pipeline by using the.add method would use before. Now, let’s run through our pipeline to see what it looks like. We can now see our filter in the diagram and if we run our pipeline, we can now see a column that contains the output of the filter with true for every stock that had 15-day average closing price above $100, and that passed our screen. That’s it. Now, you know how to create a pipeline with screens, factors, and filters, and how to run them using a pipeline engine. In the next lesson, you’ll get some practice grading pipelines with custom factors and filters.