## 9 – Calculating The Gradient 1

Okay. So, now we’ll do the same thing as we did before, painting our weights in the neural network to better classify our points. But we’re going to do it formally, so fasten your seat belts because math is coming. On your left, you have a single perceptron with the input vector, the weights and … Read more

## 8 – Backpropagation V2

So now we’re finally ready to get our hands into training a neural network. So let’s quickly recall feedforward. We have our perceptron with a point coming in labeled positive. And our equation w1x1 + w2x2 + b, where w1 and w2 are the weights and b is the bias. Now, what the perceptron does … Read more

## 7 – DL 42 Neural Network Error Function (1)

So, our goal is to train our neural network. In order to do this, we have to define the error function. So, let’s look again at what the error function was for perceptrons. So, here’s our perceptron. In the left, we have our input vector with entries x_1 up to x_n, and one for the … Read more

## 6 – DL 41 Feedforward FIX V2

So now that we have defined what neural networks are, we need to learn how to train them. Training them really means what parameters should they have on the edges in order to model our data well. So in order to learn how to train them, we need to look carefully at how they process … Read more

## 5 – Multiclass Classification

We briefly mentioned multi-class classification in the last video but let me be more specific. It seems that neural networks work really well when the problem consist on classifying two classes. For example, if the model predicts a probability of receiving a gift or not then the answer just comes as the output of the … Read more

## 4 – Layers

Neural networks have a certain special architecture with layers. The first layer is called the input layer, which contains the inputs, in this case, x1 and x2. The next layer is called the hidden layer, which is a set of linear models created with this first input layer. And then the final layer is called … Read more

## 3 – 29 Neural Network Architecture 2

So in the previous session we learn that we can add to linear models to obtain a third model. As a matter of fact, we did even more. We can take a linear combination of two models. So, the first model times a constant plus the second model times a constant plus a bias and … Read more

## 25 – Momentum

So, here’s another way to solve a local minimum problem. The idea is to walk a bit fast with momentum and determination in a way that if you get stuck in a local minimum, you can, sort of, power through and get over the hump to look for a lower minimum. So let’s look at … Read more

## 24 – Random Restart

One way to solve this is to use random restarts, and this is just very simple. We start from a few different random places and do gradient descend from all of them. This increases the probability that we’ll get to the global minimum, or at least a pretty good local minimum.

## 23 – Learning Rate

The question of what learning rate to use is pretty much a research question itself but here’s a general rule. If your learning rate is too big then you’re taking huge steps which could be fast at the beginning but you may miss the minimum and keep going which will make your model pretty chaotic. … Read more

## 22 – Batch vs Stochastic Gradient Descent

First, let’s look at what the gradient descent algorithm is doing. So, recall that we’re up here in the top of Mount Everest and we need to go down. In order to go down, we take a bunch of steps following the negative of the gradient of the height, which is the error function. Each … Read more

## 21 – Other Activation Functions

The best way to fix this is to change the activation function. Here’s another one, the Hyperbolic Tangent, is given by this formula underneath, e to the x minus e to the minus x divided by e to the x plus e to the minus x. This one is similar to sigmoid, but since our … Read more

Here’s another problem that can occur. Let’s take a look at the sigmoid function. The curve gets pretty flat on the sides. So, if we calculate the derivative at a point way at the right or way at the left, this derivative is almost zero. This is not good cause a derivative is what tells … Read more

## 2 – Combinando modelos

Now I’m going to show you how to create these nonlinear models. What we’re going to do is a very simple trick. We’re going to combine two linear models into a nonlinear model as follows. Visually it looks like this. The two models over imposed creating the model on the right. It’s almost like we’re … Read more

## 19 – Local Minima

So let’s recall a gradient descent does. What it does is it looks at the direction where you descend the most and then it takes a step in that direction. But in Mt. Everest, everything was nice and pretty since that was going to help us go down the mountain. But now, what if we … Read more

## 18 – Dropout

Here’s another way to prevent overfitting. So, let’s say this is you, and one day you decide to practice sports. So, on Monday you play tennis, on Tuesday you lift weights, on Wednesday you play American football, on Thursday you play baseball, on Friday you play basketball, and on Saturday you play ping pong. Now, … Read more

## 17 – Regularization

Well the first observation is that both equations give us the same line, the line with equation X1+X2=0. And the reason for this is that solution two is really just a scalar multiple of solution one. So let’s see. Recall that the prediction is a sigmoid of the linear function. So in the first case, … Read more

## 16 – DL 53 Q Regularization

Now let me show you a subtle way of overfitting a model. Let’s look at the simplest data set in the world, two points, the point one one which is blue and the point minus one minus one which is red. Now we want to separate them with a line. I’ll give you two equations … Read more

## 15 – Model Complexity Graph

So, let’s start from where we left off, which is, we have a complicated network architecture which would be more complicated than we need but we need to live with it. So, let’s look at the process of training. We start with random weights in her first epoch and we get a model like this … Read more

## 14 – Underfitting And Overfitting

So, let’s talk about life. In life, there are two mistakes one can make. One is to try to kill Godzilla using a flyswatter. The other one is to try to kill a fly using a bazooka. What’s the problem with trying to kill Godzilla with a flyswatter? That we’re oversimplifying the problem. We’re trying … Read more