11 – M3 L5 11 A2C Advantage ActorCritic V2

You may be wondering what the asynchronous part in A3C is about? Recall, Asynchronous Advantage Actor-Critic. Let me explain. A3C accumulates gradient updates and applies those updates asynchronously to a global neuronetwork. Each agent in simulation does this at its own time. So, the agents use a local copy of the network to collect experience, calculate, and accumulate gradient updates across multiple time steps, and then they apply these gradients to a global network asynchronously. A synchronous here means that each agent will update the network on its own. There is no synchronization between the agents. These also means that the weights an agent is using might be different from the weights in use by another agent at any given time. There is a synchronous implementation of A3C called Advantage Actor-Critic, A2C. A2C has some extra bit of code that synchronizes all agents. It waits for all agents to finish a segment of interaction with its copy of the environment, and then updates the network at once, before sending the updated weights back to all agents. A2C is arguably simpler to implement, yet it gives pretty much the same result, and allegedly in some cases performs even better. A3C is most easily train on a CPU, while A2C is more straightforward to extend to a GPU implementation.

%d 블로거가 이것을 좋아합니다: