9 – M3 L5 09 A3C Asynchronous Advantage ActorCritic Parallel Training V2
Unlike in DQN, A3C does not use a replay buffer. The main reason we needed a replay buffer was so that we could decorrelate experienced topple. Let me explain. In reinforcement learning, an agent collects experience in a sequential manner. The experience collected at time step t plus 1 will be correlated to the experience … Read more
댓글을 달려면 로그인해야 합니다.