7 – TD Control_ Expected Sarsa
So far, you’ve implemented Sarsa and Sarsamax and we’lll now discuss one more option. This new option is called expected Sarsa and it closely resembles Sarsamax, where the only difference is in the update step for the action value. Remember that Sarsamax or Q learning took the maximum over all actions of all possible next … Read more
댓글을 달려면 로그인해야 합니다.