1 – Jay’s Introduction

Hello, my name is J. I’m a content developer at Udacity, and today we’ll be talking about a powerful RNN technique called sequence to sequence. In a previous lesson, Andrew Trask showed us how to do sentiment analysis using normal feedforward neural networks. The network was able to learn how positive or negative each word was, and could tell if a sequence as a whole had positive or negative things to say about its subjects. We start running into issues, however, when we want to do a little bit more advanced models that deal with language and in sequential data. Let’s take an example. So the authors of the “Deep Learning” book present the following example to showcase this. So say we have these two sentences, I went to Nepal in 2009, and in 2009, I went to Nepal. If we train a model to read these inputs and extract the year that the person went to Nepal, we would want them to recognize 2009 as the piece of information we’re looking for, right? So if we actually train a regular feedforward neural network on this task, it will have separate parameters for each input feature. So technically it would have to learn all the rules of language separately at each position in the input sentiments. Then in another previous lesson, Matt showed us our first recurrent neural network, RNN. Recurrent nets are a powerful class of neural networks that deal with sequential data. They are especially suited for language and translation tasks, because they can extend to sequences of any length. But more importantly, they share their parameters across different timesteps. So when they do learn a language model, they do it a lot more efficiently than a traditional feedforward network would. Now when we say sequential data, we can be referring either to the input or the output of the model. I guess you might have seen this diagram before. This is from the incredible essay about RNNs by Andrej Karpathy. And it shows different kinds of RNNs that are suited for different types of tasks. The sentiment analysis RNN that Matt showed us is this one. It reads a sequence of words, and then outputs just a single value. It’s a many to one network. But if you want to build a chat bot or a translation service, you’re going to have both sequential inputs and sequential outputs. So that’s going to be on the many to many side of the diagram there to the right. So we have these two options. So if we use a single RNN, then we are forced to output at most as many vectors as we input. And that wouldn’t work for a chat bot. We also want our model to take in the entire input before we start generating the response. So this is not appropriate for our needs. But then if you look at this one, there’s this other many to many RNN. In 2014, luckily, two RNN architectures were introduced that can map a sequence of any length to another sequence of any length. The basic premise is that you use two RNNS, one that reads the input sequence, then hands over what it had learned to another RNN, which starts producing the output sequence. In the next videos, we’ll be looking more closely at the intuition and the major concepts of RNNs. And we’ll be also touching on the implementation details in TensorFlow of sequence to sequence. But before that, I just want you to think about the incredible variety of tasks that you can accomplish when you can teach a network like this.

%d 블로거가 이것을 좋아합니다: