9 – Alphazero advanced tictactoe walkthrough

Hi. Today, I’m going to show you how to use Alpha zero to train an agent to play a more advanced version of tic-tac-toe. Hopefully, by now you’ve gotten the chance to play with the basic version and successfully training a Alpha zero tic-tac-toe agent. This time, we’re going to initialize a slightly more complicated … Read more

8 – Alphazero python classes walkthrough

Hello, welcome. In this screencast, I want to walk you through how I implement some of the gaming environment and the tree search environment so that in case you want to edit the files, you can get an understanding of how they are implemented. So let’s go to the ConnectN.py to look at how the … Read more

7 – TicTacToe using AlphaZero – notebook walkthrough

Hi, welcome to the screencast. Today, I’ll share with you how to train an alphazero agent to play a game of TicTacToe. Before I go into the Jupiter notebook, let’s go back up that directory to check out all the files available to you in the workspace. So, you should see something like this, it … Read more

6 – Alpha Zero 2_ Self-Play Training

Now that we have an improved Monte-Carlo Tree Search guided by an expert policy and critic, how do we update them? Well, start with an empty board of Tic-Tac-Toe, we perform Monte-Carlo Tree Search using the current policy and critic. The end result is a list of visit counts for each actions N sub a, … Read more

5 – AlphaZero 1_ Guided Tree Search

We know that tree searches can become intractable very quickly, even when we utilize Monte Carlo methods. Take the game of Go for example. The game board is a 19 by 19 grid, and that means 361 possible first moves. The number of possible second moves is tiny bit smaller, 360. For the third move, … Read more

4 – Monte Carlo Tree Search 2 – Expansion and Back-propagation

Starting with a state, we learned previously how to search systematically through one layer of a game tree using the variables U, N, and V. Can we generalize this to go deeper into the tree so that we can better anticipate a long sequence of moves? This is possible through what’s called expansion and back-propagation. … Read more

3 – Monte Carlo Tree Search 1 – Random Sampling

Given a state in a zero sum game, how do we find an optimal policy? In theory, this is simple, because we could just perform a brute force search, and going through all the possible moves and all the possible games that can be played, and then we can choose the ones with the best … Read more

2 – Zero-Sum Game

In order to talk about AlphaZero, we first need to formalize the concepts of games that AlphaZero specializes in, Zero-Sum games. We start with a board game environment, a grid for example, then two competing agents take turns to perform actions to try to win the game. In the end, one agent’s win is another … Read more

1 – Alpha Zero Preview

In 2016, researchers at DeepMind announced a new breakthrough. The development of a new engine, AlphaGo for the game of Go. The AI was able to defeat a professional player Lee Sedol. The breakthrough was significant because Go was far more complex than chess. The number of possible games is so high that a professional … Read more