Non-linear function approximation, this is what we’ve been building up to in this lesson. Recall from our previous discussion how we can capture non-linear relationships between input state and output value using arbitrary kernels like radial basis functions as our feature transformation. In this model, our output value is still linear with respect to the features. What if our underlying value function was truly non-linear with respect to a combination of these feature values? To capture such complex relationships, let’s pass our linear response obtained using the dot product through some nonlinear function f. Does this look familiar? Yes, it is the basis of artificial neural networks. Such a non-linear function is generally called an activation function and immensely increases the representational capacity of our approximator. We can iteratively update the parameters of any such function using gradient descent. Learning rate alpha times value difference times the derivative of the function with respect to the weights.