Let’s continue with a conceptual RNN example. Assume that we want to build a sequence detector, and let’s decide that our sequence detector will track letters. So we will actually build a word detector. And more specifically, we want our network to detect the word, Udacity. Just the word. So before we start, we need to make a few decisions. We can define the input by using a one-hot vector encoding, containing seven binary values. Here, I use the letters in ascending order, but that’s arbitrary. Each letter will be represented by one in the index it corresponds to, and zeros everywhere else. So for example, the letter A will be represented by one followed by six zeros. Here are the rest of the letters that we need. The letters not appearing in the word Udacity can be addressed by a simple vector containing all zeros, like these letters, for example. The sequence we are trying to detect will be of length seven with inputs in the following order: U, D, A, C, I, T, and finally, Y. We can train the system by feeding it random letters at each timestep. Creating a sequence of inputs demonstrated here from left to right. Occasionally, we will also insert the word Udacity. We set the target values, the outputs to zero all the time except for when the last letter Y of the sequence Udacity enters the system. Only then, the target will be set to one. Again, the system will acknowledge all inputs and will respond with a target output of one only when the desired sequence is detected. Sketching our network in an unfolded form, we have an input vector of seven values, a single output, and a state. The state can have any number of hidden neurons. In this illustration, we will use n to leave things generic. The first state vector is usually set to zero, allowing the next state to evolve as inputs come in. When training the network, we set the targets to either zero or one. The target will be zero when the word Udacity is not detected, and one when it is. If we train our system on targets that are either zero or one, we expect that the output will also take on values between zero and one. After training our system and optimizing the weights, we would expect that when the sequence Udacity appears, the output will signal that a sequence has been detected by taking on a value close to one, like 0.9 in this case. Practically speaking, we can set a threshold of, say, 0.9, and decide that whenever the output exceeds this threshold value, the sequence of interest has been detected. The number 0.9 in this case, by the way, was selected as an arbitrary example. In our example, the RNN was trained to recognize sequences of seven inputs, like the letters in the word Udacity. However, we could have trained the RNN to recognize sequences of letters that have a different lengths, like five in the word happy. Generally speaking, the RNN can deal with varying sequence lengths. So, how do we train this network? Or in other words, how do we optimize its weights to minimize the output error? We do that using backpropagation through time, and we’ll be learning all about that in the next set of videos.