The field of multi-agent RL is abuzz with cutting edge research. Recently, Open AI announced that its team of five neural networks, OpenAI 5 has learned to defeat amature DoTA 2 players. OpenAI 5 has been trained using a scaled-up version of BPO. Coordination between agents is controlled using a hyperparameter called team spirit. It ranges from zero to one, where zero means agents only care about the individual reward functions while one means that they completely care about the team’s reward function.