4 – Tile Coding

If you have prior knowledge about the state space, you can manually design an appropriate discretisation scheme. Like in our gears switching example, we knew the relationship between fuel consumption and speed. But in order to function in arbitrary environments, we need a more generic method. One elegant approach for this is tile coding. Here, the underlying state space is continuous and two dimensional. We overlay multiple grids or tilings on top of the space, each slightly offset from each other. Now, any position S in the state space can be coarsely identified by the tiles that it activates. If we assign a bit to each tile, then we can represent our new discretised state as a bit vector, with ones for the tiles that get activated, and zeros elsewhere. This, by itself, is a very efficient representation. But the genius lies in how the state value function is computed using the scheme. Instead of storing a separate value for each state V of S, it is defined in terms of the bit vector for that state and a weight for each tile. The tile coding algorithm in turn updates these weights iteratively. This ensures nearby locations that share tiles also share some component of state value, effectively smoothing the learned value function. Tile coding does have some drawbacks. Just like a simple grid based approach we have to manually select the tile sizes, there offsets, number of tilings, etc, ahead of time. A more flexible approach is adaptive tile coding, which starts with fairly large tiles, and divides each tile into two whenever appropriate. How do we know when to split? Well, we can use a hero stick for that. Basically, we want to split the state space when we realize that we are no longer learning much with the current representation. That is, when our value function isn’t changing. We can stop when we have reached some upper limit on the number of splits, or some max iterations. In order to figure out which tile to split, we have to look at which one is likely to have the greatest effect on the value function. For this, we need to keep track of subtiles and their projected weights. Then, we can pick the tile with the greatest difference between subtile weights. There are many other heuristics you can use, but the main advantage of adaptive tile coding, is that it does not rely on a human to specify a discretisation ahead of time. The resulting state space is appropriately partitioned based on its complexity.

%d 블로거가 이것을 좋아합니다: