9 – M3L610 Almgren And Chriss Model SC V1

Now, we’re going to talk about the Almagren and Chriss model for optimal execution of portfolio transactions. So, the aim of the previous lesson was to give you guys some intuition of what the optimal liquidation problem was about without diving into the mathematics. In this notebook, we will dive into the mathematics and gave … Read more

8 – M3L609 Optimization SC PT4 V2

Now, if we compare this result to what we had before, we can see that the implementation shortfall in this case is bigger due to the random fluctuations in stock price. We can think of this more realistic price model as the original price model, but with some noise added to it. So, let’s see … Read more

7 – M3L608 Optimization SC PT3 V1

Let’s assume I have a total of 12 stocks and that the initial price of each stock is a $100. I will choose to sell my shares in four traits. Then each trade, I will sell three shares. For illustration purposes, we will assume that the stock price decreases $10 every time we sell three … Read more

6 – M3L607 Optimization SC PT2 V1

Let’s begin by understanding market impact. Here we have the price of a single stock over a period of time. Market impact is the effect that a market participant has when he buys or sells a number of stocks. Since the optimal liquidation problem only deals with selling stocks, for the rest of this lesson … Read more

5 – M3L606 Optimization SC PT1 V1

We will now take a look at a very common problem in finance known as the optimal liquidation problem or how to sell stocks with minimal loss. So, what is the optimal liquidation problem? Let’s assume that you have a certain number of stocks that you want to sell within a given timeframe. For example, … Read more

4 – M3L04 Advantages Of Reinforcemnt Learning For Trading RENDER V1

Let’s see how reinforcement learning can get around many of the problems we encounter while trying to create a trading algorithm using a supervised learning approach. The main benefit of using reinforcement learning for trading is that we don’t need to use mathematical models or hand-code trading strategy, because the deep reinforcement agent learns this … Read more

3 – M3L603 Challenges Of Supervised Learning RENDER V1

Let’s try to create a trading algorithm using a supervised learning approach. We can try for example, to use supervised learning to teach a computer to predict stock prices within a given timeframe. Being able to predict stock prices however, will not guarantee that we will make money. There are many reasons for this. One … Read more

2 – M3L602 High Frequency Trading HFT RENDER V2

Funded by the National Association of Securities, dealers the Nasdaq became the world’s first electronic stock market when it began trading on February eighth 1971. Since then, people have used computers to sell and buy stocks at speeds and frequency that are unmatched by any human trader. The use of computers with pre-programmed algorithms to … Read more

11 – M3L612 The Efficient Frontier V1

In this notebook, we will take a look at the efficient frontier. Recall that the expected shortfall and the variance of the optimal strategy are given by these equations. In this notebook, we will learn how to visualize and interpret this equations. We will start by taking a look at the expected shortfall. Recall that … Read more

10 – M3L611 Trading Lists SC V1

In this notebook, we will take a look at trading lists and trading trajectories. In particular, you will see how trading lists vary depending on your initial trading parameters. You will also see how to implement a trade list in a simulated trading environment. In this first section, we will see how trading lists vary … Read more

1 – M3L601 Introduction HS V1

Now that you’ve learned how actor-critic methods work, in this lesson, we’ll take a look at a particular problem in finance, and formulate it so that it can be solved through reinforcement learning. In particular, we’ll be using the deep deterministic policy gradients algorithm to determine the optimal execution of portfolio transactions. If you’re unfamiliar … Read more

9 – M3 L5 09 A3C Asynchronous Advantage ActorCritic Parallel Training V2

Unlike in DQN, A3C does not use a replay buffer. The main reason we needed a replay buffer was so that we could decorrelate experienced topple. Let me explain. In reinforcement learning, an agent collects experience in a sequential manner. The experience collected at time step t plus 1 will be correlated to the experience … Read more

8 – M3 L5 08 A3C Asynchronous Advantage ActorCritic V2

A3C stands for Asynchronous Advantage Actor-Critic. As you can probably infer from the name, we’ll be calculating the advantage function. A Pi essay, and the critic will be learning to estimate V Pi to help with that just as before. If you’re using images as inputs to your agent, A3C can use a single convolutional … Read more

7 – M3 L5 07 A Basic ActorCritic Agent V2

You know now that an actor-critic agent is an agent that uses function approximation to learn a policy any value function. So, we will then use two neural networks; one for the actor and one for the critic. The critic will learn to evaluate the state value function V Pi using the TD estimate. Using … Read more

6 – M3 L5 06 Policybased Valuebased And ActorCritic V1

Now that you have some foundational concepts down, let me give you some intuition. Let’s say you want to get better at tennis. The actor or policy-based approaching you roughly learns this way. You play a bunch of matches. You then go home, lay on the couch, and commit to yourself to do more of … Read more

5 – M3 L5 05 Baselines And Critics V1

You now know that the Monte-Carlo estimate is unbiased but has high variance, and that the TD estimate has low variance but it is biased. What are these facts good for? See when you study ring force, you learned that the return G was calculated as the total discounter return. This way of calculating G, … Read more

4 – M3 L5 04 Two Ways For Estimating Expected Returns V3

Let’s explore two very distinct and complimentary ways for estimating expected returns. On the one hand, you’d have the Monte-Carlo estimate. The Monte-Carlo estimate consists of rolling out an episode in calculating the discounter total reward from the rewards sequence. For example, in an episode A, you start in state S_t, take action A_t. The … Read more

3 – M3 L5 03 Bias And Variance V2

Let’s talk about bias and variance. In machine learning, we’re often presented there with a trade off between bias and variance. Let me give you some intuition first. Let’s say you’re a practicing your soccer shooting skills. The thing you want to do is to put the ball in the top right corner of the … Read more

2 – M3 L5 02 Motivation V1

Actor-critic methods are at the intersection of value-based methods such as DQN and policy-based methods such as reinforce. If a deep reinforcement learning agent uses a deep neural network to approximate a value function, the agent is said to be value-based. If an agent uses a deep neural network to approximate a policy, the agent … Read more

17 – M3L517 Summary HS 1 V1

Well, this is the end of the actor-critic methods lesson. That was a lot, I know. But you’ll soon have a chance to put everything into practice, and that should help you cement concepts. In this lesson, you learned about actor-critic methods, which are simply a way to reduce the variance in policy-based methods. You … Read more