1-8-16. Constant-alpha


In the video below, you will learn about another improvement that you can make to your Monte Carlo control algorithm.

Here are some guiding principles that will help you to set the value of $\alpha$ when implementing constant-$\alpha$ MC control:

  • You should always set the value for $\alpha$ to a number greater than zero and less than (or equal to) one.
    • If $\alpha=0$, then the action-value function estimate is never updated by the agent.
    • If $\alpha = 1$, then the final value estimate for each state-action pair is always equal to the last return that was experienced by the agent (after visiting the pair).
  • Smaller values for $\alpha$ encourage the agent to consider a longer history of returns when calculating the action-value function estimate. Increasing the value of $\alpha$ ensures that the agent focuses more on the most recently sampled returns.

Important Note: When implementing constant-$\alpha$ MC control, you must be careful to not set the value of $\alpha$ too close to 1. This is because very large values can keep the algorithm from converging to the optimal policy $\pi_*$​. However, you must also be careful to not set the value of $\alpha$ too low, as this can result in an agent who learns too slowly. The best value of $\alpha$ for your implementation will greatly depend on your environment and is best gauged through trial-and-error.

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.

%d 블로거가 이것을 좋아합니다: