Hi, all. Having multiple agents in a system brings in a few benefits. The agents can share their experiences with one another making each other smarter, just as we learned from our teachers and friends. However, when agents want to share, they have to communicate, which leads to a cost of communication, like extra hardware and software capabilities. A multi-agent system is robust. Agents can be replaced with a copy when they fail. Other agents in the system can take over the tasks of the failed agent, but the substituting agent now has to do some extra work. Scalability comes by virtue of design, as most multi-agent systems allow insertion of new agents easily. But, if more agents are added to the system, the system becomes more complex than before. So, it depends on the assumptions made by the algorithm and the software-hardware capabilities of the agents, whether or not these advantages will be exploited. From here onwards, we will learn about multi-agent RL, also known as MARL. When multi-agent systems used reinforcement learning techniques to train the agents and make them learn their behaviors, we call the process multi-agent reinforcement learning. Next, we will talk about a framework for MARL, just like Markov decision processes are MDPs for single-agent RL.