Before we dive into RNNs, let’s remember the process we use in feedforward neural networks. We can have many hidden layers between the inputs and the outputs, but for simplicity, we will start with a single hidden layer. We will remind ourselves why, when, and how it is used. After we have a clear understanding of the mathematical background as well, we will take another step and open the door to RNNs. You will see that once you have a solid understanding in the basic feedforward network, the step towards RNNs is a simple one. Some of you may be familiar with the concept of convolutional neural networks, or CNNs in short. When implementing your neural net, you will find that you can combine these techniques. For example, one can use CNNs in the first few layers for the purposes of feature extraction, and then use RNNs in the final layer where memory needs to be considered. A popular application for this is in gesture recognition. But no worries, if you’re not familiar with CNNs, that’s okay. It’s not the focus of this lesson. When working on a feedforward neural network, we actually simulate an artificial neural network by using a nonlinear function approximation. That function will act as a system that has n number of inputs, weights, and k number of outputs. We will use x as the input vector and y as the output vector. Inputs and outputs can also be many-to-many, many-to-one, and one-to-many. So, when neural network essentially works as a nonlinear function approximator, what we do is we try to fit a smooth function between given points like x1 y1, x2 y2, and so on, in such a way that when we have a new input x’, we can find the new output y’. We will elaborate on these nonlinear function approximations later in the lesson. There are two main types of applications. One is classification, where we identify which of a set of categories a new input belongs to. For example, an image classification where the neural network receives as an input an image, and can know if it’s a cat. The other application is regression, where we approximate a function, so the network produces continuous values following a supervised training process. A simple example can be time series forecasting, where we predict the price of a stock tomorrow based on the price of the stock over the past five days. The input to the network would be five values representing the price of the stock for each of the past five days, and the output we want is tomorrow’s price. Our task in neural networks is to find the best set of weights that yield a good output where x represents the inputs, and W represents the weights. We start with random weights. In feedforward neural networks, we have static mapping from the inputs to the outputs. We use the word static as we have no memory and the output depends only on the inputs and the weights. In other words, for the same input and the same weights, we always receive the same output. Generally speaking, when working with neural networks, we have two primary phases: training and evaluation. In the training phase, we take the dataset called the training set which includes many pairs of inputs and their corresponding targets or outputs. And the goal is to find a set of weights that would best map the inputs to the desired outputs. In other words, the goal of the training phase is to yield a network that generalizes beyond the train set. In the evaluation phase, we use the network that was created in the training phase, apply our new inputs, and expect to obtain the desired outputs.