# 6 – M4 L2 C06 Markov Games 2 V1

Consider an example of single agent reinforcement learning. We have a drone with the task of grabbing a package. The possible actions are going right, left, up, down, and grasping. The reward is plus 50 for grasping the package, at minus one otherwise. Now, the difference in multi-agent RL, is that we have more than one agent. So, say we have a second drone. Now, both the drones are collaboratively trying to grasp the package. They’re both observing the packets from their respective positions. They both have their own policies that returned an action for their observations. Both also have their own set of actions. The main thing about multi-agent RL, is that there is also a joint set of actions. Both the left drone and the right drone must begin action. For example, the bear DL is bended left drone moves down, and the right drone moves to the left. This example illustrates the Markov game framework, which we are now ready to discuss in more detail. A Markov game, is a tuple written as this; your n is the number of agents, S is the set of states of the environment, AI is the set of actions of each agent I, A is the joint action space, OI is the set of observations of agent I, RI is the default function of agent I, which returns a real value for acting in action in a particular state, Pi i is the policy of each agent i, that given its observations, returns a probability distribution over the actions AI, T is the state transition function. Given the current state and the joint action, it provides a probability distribution over the set of possible next states. Note, that even here the state transitions are Markovian, just like in an MDP. Recall, that Markovian means that the next state depends only on the present state and the actions taken in this state. However, the transition function now depends on the joint action. You may find slightly varying definitions at different places. In the next video, we will discuss about two approaches to model. Let’s get going.