6 – Function Approximation

So far, we’ve looked at ways to discretize continuous state spaces. This enables us to use existing reinforcement learning algorithms with little or no modification. But there are some limitations. When the underlying space is complicated, the number of discrete states needed can become very large. Thus, we lose the advantage of discretization. Moreover, if you think about positions in the state space that are nearby, you would expect their values to be similar, or smoothly changing. Discretization doesn’t always exploit this characteristic, failing to generalize well across the space. What we’re after is the true state value function v pi, or action value function q pi. Which is typically smooth and continuous over the entire space. As you can imagine, capturing this completely is practically infeasible except for some very simple problems. Our best hope is function approximation. It is still an approximation because we don’t know what the true underlying function is. A general way to define such an approximation is to introduce a parameter vector W that shapes the function. Our tasks then, reduces to tweaking this parameter vector till we find the desired approximation. Note that the approximating function can either map a state to its value, or a state action pair to the corresponding q value. Another form is where we map from one state to a number of different q values, one for each action all at once. This is especially useful for q learning as we’ll see later. Let’s focus on this first case. Approximating a state value function. Now, we have this box here in the middle that’s supposed to do some magic. And convert the state s, and parameter vector W into a scalar value. But how? The first thing we need to do is to ensure we have a vector representing the state. Your state might already be a vector in which case you don’t need to do anything. In general, we’ll define a transformation that converts any given state s into a feature vector X S. This also gives us more flexibility, since we don’t have to operate on the raw state values. We can use any computed or derived features instead. Okay, we now have a feature vector X S, and a parameter vector W, and we want a scalar value. What do we do when we have two vectors, and want to produce a scalar? Dot Product. Yes. It’s the simplest thing we could do. In fact, this is the same as computing a linear combination of features. Multiply each feature with the corresponding weight, and sum it up. This is known as linear function approximation. That is we are trying to approximate the underlying value function with a linear function.

%d 블로거가 이것을 좋아합니다: