We are finally ready to talk about Recurrent Neural Networks or RNN’s in short. Everything we’ve seen so far prepared us for this moment. We went over the feedforward process, as well as the back propagation process in much detail. This will all help you understand the next set of videos. As I mentioned before, if you look up the definition of the word Recurrent, you will find that it simply means occurring often or repeatedly. So why are these networks called Recurrent Neural Networks? It’s simply because with RNN’s, we perform the same task for each element in the input sequence. We will see a lot more of this sketch later. RNN’s also attempt to address the need for capturing information and previous inputs by maintaining internal memory elements, also known as States. Many applications have temporal dependencies. Meaning, that the current output depends not only on the current input, but also on a memory element which takes into account past inputs. For cases like these, we need to use RNN’s. A good example for the use of RNN is predicting the next word in a sentence, which typically requires looking at the last few words rather than only the current one. We also mentioned quite a few other categories of applications, such as Sentiment Analysis, Speech Recognition, Time Series Predictions, Natural Language Processing, and Gesture Recognition. Frankly, applications of our RNN’s are popping up almost everyday making it challenging to keep up. So how should we think about this new neural network? What is its structure? How are the training, and evaluation phases changed? Well, RNN’s are based on the same principles behind feedforward neural networks, which is why we spend so much time reminding ourselves of the latter. Just as feedforward neural networks, in the RNN network, the inputs and outputs can also be many-to-many, many-to-one, and one-to-many. There are however, two fundamental differences between RNN’s, and feedforward neural networks. The first, is the manner by which we define our inputs and outputs. Instead of training the network using a single-input, single-output at each time step, we train with sequences since previous inputs matter. The second difference, stems from the memory elements that RNN’s host. Current inputs, as well as activations of neurons serve as inputs to the next time step. In feedforward neural networks, we saw a flow of information from the input to the output without any feedback. Now, that feedforward scheme changes, and includes the feedback or memory elements. We will consider memory defined, as the output of the hidden layer, which will serve as an additional input to the network at the following training step. We will no longer use H as the output of the hidden layer, but S for state, referring to a system with memory. The basic scheme of RNN is called Simple RNN, and is also known as an Elman Network. Notice that in this illustration, I only used two outputs. Well, you can have many outputs as well, certainly, more than two. But to simplify the sketch a bit, let’s just stay with two for now. For those of you who come from engineering or computer science backgrounds, this will probably remind you of a simple state machine with combination of logic, and memory. The output depends not only on the external inputs, but also on previous inputs which come from memory cells. The RNN is conceptually similar. But the beauty in our case, is that the system will train itself, and learn how to optimize the weight matrix to realize the network.