7 – M2L3 02 V2
You may be wondering why do we need to find optimal policies directly when value-based methods seem to work so well. There are three arguments we’ll consider, simplicity, stochastic policies, and continuous action spaces. Remember that in value-based methods like Q-learning, we invented this idea of a value function as an intermediate step towards finding … Read more
댓글을 달려면 로그인해야 합니다.