17 – M3L517 Summary HS 1 V1

Well, this is the end of the actor-critic methods lesson. That was a lot, I know. But you’ll soon have a chance to put everything into practice, and that should help you cement concepts. In this lesson, you learned about actor-critic methods, which are simply a way to reduce the variance in policy-based methods. You learned that the TD estimate is a one-step bootstrapping estimate and that the Monte Carlo estimate is an infinite step bootstrap in estimate. You learned that you can use any number of n-step bootstrapping to estimate expected returns. You also learn that you can create a mixture of all n-step returns into Lambda return. You learn some of the differences between off-policy and non-policy learning. You also learn about many different actor-critic algorithms: A3C, A2C, GAE, and DDPG. Lastly, I hope that the code walk through videos help you understand these algorithms better. That’s it. Good luck going forward. Remember to keep your Gamma as high as possible, to really indicate your learning rate to zero, and to never ever set your Epsilon to zero. If you didn’t get that, you ought to get back to the basics and continue studying reinforcement learning. See you out there.

%d 블로거가 이것을 좋아합니다: