Now I’m going to show you how to create these nonlinear models. What we’re going to do is a very simple trick. We’re going to combine two linear models into a nonlinear model as follows. Visually it looks like this. The two models over imposed creating the model on the right. It’s almost like we’re doing arithmetic on models. It’s like saying “This line plus this line equals that curve.” Let me show you how to do this mathematically. So a linear model as we know is a whole probability space. This means that for every point it gives us the probability of the point being blue. So, for example, this point over here is in the blue region so its probability of being blue is 0.7. The same point given by the second probability space is also in the blue region so it’s probability of being blue is 0.8. Now the question is, how do we combine these two? Well, the simplest way to combine two numbers is to add them, right? So 0.8 plus 0.7 is 1.5. But now, this doesn’t look like a probability anymore since it’s bigger than one. And probabilities need to be between 0 and 1. So what can we do? How do we turn this number that is larger than 1 into something between 0 and 1? Well, we’ve been in this situation before and we have a pretty good tool that turns every number into something between 0 and 1. That’s just a sigmoid function. So that’s what we’re going to do. We applied the sigmoid function to 1.5 to get the value 0.82 and that’s the probability of this point being blue in the resulting probability space. So now we’ve managed to create a probability function for every single point in the plane and that’s how we combined two models. We calculate the probability for one of them, the probability for the other, then add them and then we apply the sigmoid function. Now, what if we wanted to weight this sum? What, if say, we wanted the model in the top to have more of a saying the resulting probability than the second? So something like this where the resulting model looks a lot more like the one in the top then like the one in the bottom. Well, we can add weights. For example, we can say “I want seven times the first model plus the second one.” Actually, I can add the weights since I want. For example, I can say “Seven times the first one plus five times the second one.” And when I do get the combine the model is I take the first probability, multiply it by seven, then take the second one and multiply it by five and I can even add a bias if I want. Say, the bias is minus 6, then we add it to the whole equation. So we’ll have seven times this plus five times this minus six, which gives us 2.9. We then apply the sigmoid function and that gives us 0.95. So it’s almost like we had before, isn’t it? Before we had a line that is a linear combination of the input values times the weight plus a bias. Now we have that this model is a linear combination of the two previous model times the weights plus some bias. So it’s almost the same thing. It’s almost like this curved model in the right. It’s a linear combination of the two linear models before or we can even think of it as the line between the two models. This is no coincidence. This is at the heart of how neural networks get built. Of course, we can imagine that we can keep doing this always obtaining more new complex models out of linear combinations of the existing ones. And this is what we’re going to do to build our neural networks.