Hi everyone. The paper that you’ve chosen implements a multi-agent version of DDPG. DDPG, as you might remember, is an off policy actor-critic algorithm that uses the concept of target networks. The input of the action network is the current state while the output is a real value or a vector representing an action chosen from a continuous action space. Open AI has created a multi-agent environment called multi-agent particle. It consists of particles that is agents and some landmarks. A lot of interesting experimental scenarios have been laid out in this environment. We’ve chosen one of the many scenarios called physical deception. Here, any agents cooperate to reach the target landmark out of end landmarks. There is an adversary which is also trying to reach the target landmark, but it doesn’t know which out of the end landmarks is the target landmark.