9 – 08 Backpropagation Theory V6 Final

Now that we completed a feedforward pass, received an output, and calculated the error, we are ready to go backwards in order to change our weights with a goal of decreasing the network error. Going backwards from the output to the input while changing the weights, is a process we call back propagation, which is … Read more

7 – 06 FeedForward A V7 Final

Let’s look at the feedforward part first. To make our computations easier, let’s decide to have n inputs, three neurons in a single hidden layer, and two outputs. By the way, in practice, we can have thousands of neurons in a single hidden layer. We will use W_1 as the set of weights from x … Read more

6 – 05 RNN FFNN Reminder B V6 Final

Let’s look at a basic model of an artificial neural network, where we have only a single, hidden layer. The inputs are each connected to the neurons in the hidden layer and the neurons in the hidden layer are each connected to the neurons in the output layer where each neuron there represents a single … Read more

5 – 04 RNN FFNN Reminder A V7 Final

Before we dive into RNNs, let’s remember the process we use in feedforward neural networks. We can have many hidden layers between the inputs and the outputs, but for simplicity, we will start with a single hidden layer. We will remind ourselves why, when, and how it is used. After we have a clear understanding … Read more

4 – 03 RNN Applications V3 Final

To give you an idea of how useful RNNs and LSTMs are let’s take a sneak peek. The world’s leading tech companies are all using RNNs and LSTMs in their applications. Let’s take a look at some of those. Speech recognition, where a sequence of data samples extracted from an audio signal is continuously mapped … Read more

3 – 02 RNN History V4 Final

After the first wave of artificial neural networks in the mid 80s, it became clear that feedforward networks are limited since they are unable to capture temporal dependencies, which, as we said before, are dependencies that change over time. Modeling temporal data is critical in most real-world applications, since natural signals like speech and video … Read more

22 – 23 From RNNs To LSTMs V4 Final

The Long Short-Term Memory cells, or LSTM cells, were proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber. The goal of the cell is to overcome the vanishing gradient problem. You will see that it allows certain inputs to be latched, or stored, for long periods of time without forgetting them as would be the … Read more

21 – RNN Summary

To summarize what we’ve discussed, we now understand that in RNNs, the current state depends on the inputs as well as on the previous states, with the use of an activation function, like the Hyperbolic Tangent, the Sigmoid or the ReLU function for example. The current output is a simple linear combination of the current … Read more

20 – 21 RNN BPTT C V7 Final

We still have to adjust Wx. The weight matrix connecting the input layer to the hidden or state layer. Let’s simplify the sketch and leave only what we need. You will see that the process we follow to adjust Wx will be very similar to the one we used when updating Ws. Having said that, … Read more

2 – 01 RNN Intro V6 Final

In this lesson, we will focus on recurrent neural networks, or what we call in short RNNs. Many applications involve temporal dependencies or dependencies over time. What does that mean? Well, it means our current output depends not only on the current input, but also on past inputs. So, if I want to make dinner … Read more

19 – 20 RNN BPTT B V5 Final

So, let’s unfold the model in time, and clean the sketch a bit, and focus on the third time step. In this model, we have three weight matrices that we want to modify. The weight matrix Wx linking the network inputs to the state or the hidden layer, the weight matrix Ws connecting one stage … Read more

18 – 19 RNN BPTT A V6 Final

Hopefully, you are now feeling more confident and have a deeper conceptual understanding of RNNS. But how do we train such networks? How can we find a good set of weights that would minimize the error? You will see that our training framework will be similar to what we’ve seen before, with a slight change … Read more

17 – 18 RNN Example V5 Final

Let’s continue with a conceptual RNN example. Assume that we want to build a sequence detector, and let’s decide that our sequence detector will track letters. So we will actually build a word detector. And more specifically, we want our network to detect the word, Udacity. Just the word. So before we start, we need … Read more

16 – 17 RNN Unfolded V3 Final

The unfolding in time scheme can be confusing. So let’s go back for a bit, look at it closely, and see what’s actually going on there. First, we will take the Elman network, and tilted by 90 degrees counter-clockwise. As in RNNs, we usually display the flow of information from the bottom to the top. … Read more

15 – 16 RNN B V4 Final

In feedforward neural networks, the output at any time is a function of the current input and the weights alone. We assume that the inputs are independent of each other. Therefore, there is no significance to the sequence. So we actually train the system by randomly drawing inputs and target pairs. In RNNs, our output … Read more

14 – 14 RNN A V4 Final

We are finally ready to talk about Recurrent Neural Networks or RNN’s in short. Everything we’ve seen so far prepared us for this moment. We went over the feedforward process, as well as the back propagation process in much detail. This will all help you understand the next set of videos. As I mentioned before, … Read more

13 – 12 Backpropagation Example B V6 Final

We now need to calculate the gradient. We will do that one step at a time. In our example, we only have one hidden layer. So the back propagation process will have two steps. Let’s be more precise now and decide that the gradient calculated for each element Ij in the matrix is called delta … Read more

12 – Regra da cadeia

So before we start calculating derivatives, let’s do a refresher on the chain rule which is the main technique we’ll use to calculate them. The chain rule says, if you have a variable x on a function f that you apply to x to get f of x, which we’re gonna call A, and then … Read more

11 – 10 Backpropagation Example A V3 Final

Remember our feedforward illustration? We had n inputs, three neurons in the hidden layer, and two outputs. For this example, we will need to simplify things even more and look at a model with two inputs, x_1 and x_2, and a single output, y. We will have a weight matrix, W_1, from the input to … Read more