9 – 09 Additive Attention V2

In this video, we’ll look at the third commonly used scoring method. It’s called concat, and the way to do it is to use a feedforward neural network. To take a simple example, let’s say we’re scoring this encoder hidden state, at the fourth time step at the decoder. Again this is an oversimplified example … Read more

8 – 08 Multiplicative Attention V2

Earlier in this lesson, we looked at how the key concept of attention is to calculate an attention weight vector, which is used to amplify the signal from the most relevant parts of the input sequence and in the same time, drown out the irrelevant parts. In this video, we’ll begin to look at the … Read more

7 – 07 Additive And Multiplicative Attention V1

Before delving into the details of scoring functions, we need to make a distinction of the two major types of attention. These are often referred to as “Additive Attention and Multiplicative Attention.” Sometimes they’re also called “Bahdanau Attention” and “Luong Attention,” referring to the first authors of the papers, which described them. Bahdanau attention refers … Read more

6 – 06 Attention Decoder V1

Let’s now look at things on the decoder side. In models without attention, we’d only feed the last context vector to the decoder RNN, in addition to the embedding of the end token, and it will begin to generate an element of the output sequence at each time-step. The case is different in an attention … Read more

5 – 05 Attention Encoder V2

Now that we’ve taken a high level look at how attention works in a sequence to sequence model, let’s look into it in more detail. We’ll use machine translation as the example as that’s the application the main papers on attention tackled. But whatever we do here, translates into other applications as well. It’s important … Read more

4 – 04 Attention Overview Decoding V2

Now, let’s look at the attention decoder and how it works at a very high level. At every time step, an attention decoder pays attention to the appropriate part of the input sequence using the context factor. How does the attention decoder know which of the parts of the input sequence to focus on at … Read more

3 – 03 Attention Overview Encoding V2

A Sequence to Sequence Model with attention works in the following way. First, the encoder processes the input sequence just like the model without attention one word at a time, producing a hidden state and using that hidden state and the next step. Next, the model passes a context vector to the decoder but unlike … Read more

2 – 02 Sequence To Sequence Recap V2

Welcome back. In this video, we’ll briefly recap how sequence to sequence models work. A sequence to sequence model takes in an input that is a sequence of items, and then it produces another sequence of items as an output. In a machine translation application, the input sequence is a series of words in one … Read more

12 – 12 The Transformer And Self Attention V2

Let’s look at how self-attention works in a little bit more detail. Let’s say, we have these words that we want our encoder to read and create a representation of. As always, we begin by embedding them into vectors. Since the transformer gives us a lot of flexibility for parallelization, this example assumes we’re looking … Read more

11 – 11 Other Attention Methods V2

Since the two main Attention papers were published in 2014 and ’15, Attention has been an active area of research with many developments. While the two mechanisms continue to be commonly used, there have been significant developments over the years. In this video, we will look at one of these developments published in a paper … Read more

10 – 10 Computer Vision Applications V3

In this concept, we’ll go over some of the computer vision applications and tasks that attention empowers. In the text below the video, we’ll link to a number of papers in case you want to go deeper into any specific application or task. In this video, we’ll focus on image captioning and one of the … Read more

1 – 01 Introduction To Attention V2

Hello, I’m Jay, and in this lesson, we’ll be talking about one of the most important innovations in deep learning in the last few years, Attention. Attention started out in the field of computer vision as an attempt to mimic human perception. This is a quote from a paper on Visual Attention from 2014. It … Read more

4 – Architecture in More Depth

Now that we have an unrolled example of input and output of the network, let’s go another level deeper, and look at some of the parameters of the model. So at this point of the course you know that you can’t just feed words directly to the network. We need to turn the words into … Read more

3 – Architecture encoder decoder

Let’s look more closely at how sequence to sequence models work. We’ll start with a high level look and then go deeper and deeper. Here are our two recurrent nets. The one on the left is called the encoder. It reads the input sequence, then hands over what it has understood to the RNN and … Read more

2 – Applications seq2seq

I do want to say a couple of words on applications before delving deeper into the concept. That’s because the term sequence-to-sequence RNN is a little bit abstract and doesn’t relay how many amazing things we can do with this type of model. So let’s think of it like this. We have a model that … Read more

1 – Jay’s Introduction

Hello, my name is J. I’m a content developer at Udacity, and today we’ll be talking about a powerful RNN technique called sequence to sequence. In a previous lesson, Andrew Trask showed us how to do sentiment analysis using normal feedforward neural networks. The network was able to learn how positive or negative each word … Read more

2 – Sentiment Analysis 1

This is the example we’ll use in this section, the IMDb Movie Reviews. We will be splitting them into two kinds. Reviews such as what a great movie, which we’ll classify as positive and review such as, that was terrible which we’ll classify as negative. For machine learning perspective, you can think of sentiment analysis … Read more

1 – Intro

Hi, this is Louis. Welcome to the Sentiment Analysis section. Sentiment Analysis is probably one of the most popular uses of natural language processing these days. Sentiment Analysis has become an important tool for many purposes. For example: understanding customer sentiment around the company for making investment decisions, getting a feedback signal for social media … Read more

9 – NLPND LDA 09 Dirichlet Distributions RENDER V2

Now, a Multinomial Distribution is simply a generalization of the binomial distribution to more than two values. For example, let’s say we have newspaper articles and three topics: science, politics, and sports. Let’s say each topic is assigned randomly to the articles, and when we look, we have three science articles, six politics articles, and … Read more

8 – Beta Distributions

So, let’s go to probability distributions. Let’s say we have a coin, and we toss it twice. Let’s say we get one heads and one tails. What would we think about this coin? Well, it could be a fair coin, right? It could also be slightly biased towards either heads or tails. We don’t have … Read more