So, can we think about adapting the single-agent auto techniques we’ve learned about so far to the multi-agent case? Two extreme approaches come to mind. The simplest approach should be to train all the agents independently without considering the existence of other agents. In this approach, any agent considers all the others to be a part of the environment and learns its own policy. Since all are learning simultaneously, the environment as seen from the prospective of a single agent, changes dynamically. This condition is called non-stationarity of the environment. In most single agent algorithms, it is assumed that the environment is stationary, which leads to certain convergence guarantees. Hence, under non-stationarity conditions, these guarantees no longer hold. The second approach is the matter agent approach. The matter agent approach takes into account the existence of multiple agents. Here, a single policy is lowered for all the agents. It takes as input the present state of the environment and returns the action of each agent in the form of a single joint action vector. Typically, a single reward function given the environment state and the action vector returns a global reward. The joint action space as we had discussed before, would increase exponentially with the number of agents. If the environment is partially observable or the agents can only see locally, each agent will have a different observation of the environment state, hence it will be difficult to disambiguate the state of the environment from different local observations. So this approach works well only when each agent knows everything about the environment. In the next video, we will talk about the different kinds of environment that exist in multi-agent systems. Let’s move on to the next video.