## 9 – Alphazero advanced tictactoe walkthrough

Hi. Today, I’m going to show you how to use Alpha zero to train an agent to play a more advanced version of tic-tac-toe. Hopefully, by now you’ve gotten the chance to play with the basic version and successfully training a Alpha zero tic-tac-toe agent. This time, we’re going to initialize a slightly more complicated … Read more

## 8 – Alphazero python classes walkthrough

Hello, welcome. In this screencast, I want to walk you through how I implement some of the gaming environment and the tree search environment so that in case you want to edit the files, you can get an understanding of how they are implemented. So let’s go to the ConnectN.py to look at how the … Read more

## 7 – TicTacToe using AlphaZero – notebook walkthrough

Hi, welcome to the screencast. Today, I’ll share with you how to train an alphazero agent to play a game of TicTacToe. Before I go into the Jupiter notebook, let’s go back up that directory to check out all the files available to you in the workspace. So, you should see something like this, it … Read more

## 6 – Alpha Zero 2_ Self-Play Training

Now that we have an improved Monte-Carlo Tree Search guided by an expert policy and critic, how do we update them? Well, start with an empty board of Tic-Tac-Toe, we perform Monte-Carlo Tree Search using the current policy and critic. The end result is a list of visit counts for each actions N sub a, … Read more

## 5 – AlphaZero 1_ Guided Tree Search

We know that tree searches can become intractable very quickly, even when we utilize Monte Carlo methods. Take the game of Go for example. The game board is a 19 by 19 grid, and that means 361 possible first moves. The number of possible second moves is tiny bit smaller, 360. For the third move, … Read more

## 4 – Monte Carlo Tree Search 2 – Expansion and Back-propagation

Starting with a state, we learned previously how to search systematically through one layer of a game tree using the variables U, N, and V. Can we generalize this to go deeper into the tree so that we can better anticipate a long sequence of moves? This is possible through what’s called expansion and back-propagation. … Read more

## 3 – Monte Carlo Tree Search 1 – Random Sampling

Given a state in a zero sum game, how do we find an optimal policy? In theory, this is simple, because we could just perform a brute force search, and going through all the possible moves and all the possible games that can be played, and then we can choose the ones with the best … Read more

## 2 – Zero-Sum Game

In order to talk about AlphaZero, we first need to formalize the concepts of games that AlphaZero specializes in, Zero-Sum games. We start with a board game environment, a grid for example, then two competing agents take turns to perform actions to try to win the game. In the end, one agent’s win is another … Read more

## 1 – Alpha Zero Preview

In 2016, researchers at DeepMind announced a new breakthrough. The development of a new engine, AlphaGo for the game of Go. The AI was able to defeat a professional player Lee Sedol. The breakthrough was significant because Go was far more complex than chess. The number of possible games is so high that a professional … Read more

## 9 – M4 L2 C09 Paper Description Part I HSAEG V1

The field of multi-agent RL is abuzz with cutting edge research. Recently, Open AI announced that its team of five neural networks, OpenAI 5 has learned to defeat amature DoTA 2 players. OpenAI 5 has been trained using a scaled-up version of BPO. Coordination between agents is controlled using a hyperparameter called team spirit. It … Read more

## 8 – M4 L2 C08 Cooperation Competition Mixed Environments A V1

For this video, let’s pretend that you and your sister are playing a game of ball. You are given one bank or 100 coins from which you plan on buying a video game console. For each time either of you misses the ball, you lose one coin from the bank to your parents. Hence, you … Read more

## 7 – M4 L2 C07 Approaches To MARL V1

So, can we think about adapting the single-agent auto techniques we’ve learned about so far to the multi-agent case? Two extreme approaches come to mind. The simplest approach should be to train all the agents independently without considering the existence of other agents. In this approach, any agent considers all the others to be a … Read more

## 6 – M4 L2 C06 Markov Games 2 V1

Consider an example of single agent reinforcement learning. We have a drone with the task of grabbing a package. The possible actions are going right, left, up, down, and grasping. The reward is plus 50 for grasping the package, at minus one otherwise. Now, the difference in multi-agent RL, is that we have more than … Read more

## 5 – M4 L2 C05 Benefits Of Multi Agent Systems V2

Hi, all. Having multiple agents in a system brings in a few benefits. The agents can share their experiences with one another making each other smarter, just as we learned from our teachers and friends. However, when agents want to share, they have to communicate, which leads to a cost of communication, like extra hardware … Read more

## 4 – M4 L2 C04 Applications Of Multi Agent Systems V2

In this video, we will discuss some potential real-life applications of multi-agent systems. A group of drones or robots whose aim is to pick up a package and drop it to the destination is a multi-agent system. In the stock market, each person who is trading can be considered as an agent and the profit … Read more

## 3 – M4 L2 C03 Motivation For Multi Agent Systems V1

In this video, we will seek some motivation for why we should consider multiple agents in the context of Artificial Intelligence. Keep in mind that the ultimate goal of AI is to solve intelligence. We live in a multi agent world, we do not become intelligent in isolation. As a baby, the closest interactions that … Read more

## 2 – M4 L2 C02 Introduction To Multi Agent Systems V1

In this video, we are going to get an understanding of multi-agent systems. Multi-agent systems are present everywhere around us, be it early in the morning when you’re making your way through traffic to get to work or when your favorite soccer players are competing in a game or when a swarm of bees is … Read more

## 12 – M4 L2 C11 Summary HS V1

Hey, everyone. With this, we’ve reached the end of the exciting module on Multi-agent RL. We began by introducing ourselves to the multi-agent systems present in our surroundings. We reasoned why multi-agent systems are an important puzzle to solving AI, and decided to pursue this complex topic. We also studied the Markov games framework, which … Read more

## 11 – M4 L2 C10b Paper Description Part II V2

The normal agents are rewarded based on the least distance of any of the agents to the landmark, and penalized based on the distance between the adversary and the target landmark. Under this reward structure, the agents cooperate to spread out across all the landmarks, so as to deceive the adversary. The framework of centralized … Read more

## 10 – M4 L2 C10a Paper Description Part II V1

Hi everyone. The paper that you’ve chosen implements a multi-agent version of DDPG. DDPG, as you might remember, is an off policy actor-critic algorithm that uses the concept of target networks. The input of the action network is the current state while the output is a real value or a vector representing an action chosen … Read more