3 – Discretization

As the name suggests, discretization is basically converting a continuous space into a discrete one. Remember our continuous vacuum cleaner world? All we’re saying is let’s bring back a grid structure with discrete positions identified. Note that we’re not really forcing our agent to be in exactly the center of these positions. Since the underlying world is continuous, we don’t have control over that. But in our representation of the state space, we only identify certain positions as relevant. For instance, whether the robot is at 3.1, 2.4, or 2.9, 1.8, we can round that off to three, two. Yes, this will almost always be a little incorrect, but for some environments, discretizing the state-space can work out very well. It enables us to use existing algorithms with little, or no modification. Actions can be discretized as well. For example, angles can be divided into whole degrees, or even 90 degrees increments, if appropriate. Now, let’s imagine there are objects in this discretized world, obstacles that the robot may need to avoid. With our grid representation, all we can do is mark off the cells where an object is present, even by a little. This is known as an occupancy grid. But our choice of discretization may lead the agent into thinking, there is no path across these obstacles to reach some desired locations. Instead, if we could vary the grid according to these obstacles, then we could open up a feasible path for the agent. An alternate approach would be to divide up the grid into smaller cells where required. It would still be an approximation. But it’ll allow us to allocate more of our state representation to where it matters. Better than dividing the entire state space into finer cells, which may increase the total number of states, and in turn, the time needed to compute value functions. If you’re familiar with binary space partitioning, or quad trees, this is exactly the same idea. Now, you may be wondering. This sort of discretization makes sense in spacial domains like gridworlds. But what about other state spaces? Let’s change gears and look at a different domain. Most cars these days have automatic transmission. Have you ever wondered how the car decides to pick what gear to switch to, and when? Here’s a simplified plot of how fuel consumption varies with speed for different gears in a typical car. Let’s assume that our state only consists of the vehicle speed, and which gear we are in. And our reward is inversely proportional to fuel consumption. The actions available to our agent are essentially switching up, or down. Now, although speed is a continuous value, it can be discretized into ranges, such that a single gear is the most optimal in each range. Note that these ranges can be of different lengths, that is, the discretization is non-uniform. If there were other dimensions to the state-space such as throttle position, then they could be subdivided non-uniformly as well.

%d 블로거가 이것을 좋아합니다: