In the video below, you will learn about another improvement that you can make to your Monte Carlo control algorithm.
Here are some guiding principles that will help you to set the value of $\alpha$ when implementing constant-$\alpha$ MC control:
- You should always set the value for $\alpha$ to a number greater than zero and less than (or equal to) one.
- If $\alpha=0$, then the action-value function estimate is never updated by the agent.
- If $\alpha = 1$, then the final value estimate for each state-action pair is always equal to the last return that was experienced by the agent (after visiting the pair).
- Smaller values for $\alpha$ encourage the agent to consider a longer history of returns when calculating the action-value function estimate. Increasing the value of $\alpha$ ensures that the agent focuses more on the most recently sampled returns.
Important Note: When implementing constant-$\alpha$ MC control, you must be careful to not set the value of $\alpha$ too close to 1. This is because very large values can keep the algorithm from converging to the optimal policy $\pi_*$. However, you must also be careful to not set the value of $\alpha$ too low, as this can result in an agent who learns too slowly. The best value of $\alpha$ for your implementation will greatly depend on your environment and is best gauged through trial-and-error.