9 – 07 Additive And Multiplicative Attention V1

Before delving into the details of scoring functions, we need to make a distinction of the two major types of attention. These are often referred to as “Additive Attention and Multiplicative Attention.” Sometimes they’re also called “Bahdanau Attention” and “Luong Attention,” referring to the first authors of the papers, which described them. Bahdanau attention refers … Read more

8 – 06 Attention Decoder V1

Let’s now look at things on the decoder side. In models without attention, we’d only feed the last context vector to the decoder RNN, in addition to the embedding of the end token, and it will begin to generate an element of the output sequence at each time-step. The case is different in an attention … Read more

7 – 05 Attention Encoder V2

Now that we’ve taken a high level look at how attention works in a sequence to sequence model, let’s look into it in more detail. We’ll use machine translation as the example as that’s the application the main papers on attention tackled. But whatever we do here, translates into other applications as well. It’s important … Read more

6 – 04 Attention Overview Decoding V2

Now, let’s look at the attention decoder and how it works at a very high level. At every time step, an attention decoder pays attention to the appropriate part of the input sequence using the context factor. How does the attention decoder know which of the parts of the input sequence to focus on at … Read more

5 – 03 Attention Overview Encoding V2

A Sequence to Sequence Model with attention works in the following way. First, the encoder processes the input sequence just like the model without attention one word at a time, producing a hidden state and using that hidden state and the next step. Next, the model passes a context vector to the decoder but unlike … Read more

4 – 02 Sequence To Sequence Recap V2

Welcome back. In this video, we’ll briefly recap how sequence to sequence models work. A sequence to sequence model takes in an input that is a sequence of items, and then it produces another sequence of items as an output. In a machine translation application, the input sequence is a series of words in one … Read more

3 – Architecture encoder decoder

Let’s look more closely at how sequence to sequence models work. We’ll start with a high level look and then go deeper and deeper. Here are our two recurrent nets. The one on the left is called the encoder. It reads the input sequence, then hands over what it has understood to the RNN and … Read more

2 – Applications seq2seq

I do want to say a couple of words on applications before delving deeper into the concept. That’s because the term sequence-to-sequence RNN is a little bit abstract and doesn’t relay how many amazing things we can do with this type of model. So let’s think of it like this. We have a model that … Read more

14 – 12 The Transformer And Self Attention V2

Let’s look at how self-attention works in a little bit more detail. Let’s say, we have these words that we want our encoder to read and create a representation of. As always, we begin by embedding them into vectors. Since the transformer gives us a lot of flexibility for parallelization, this example assumes we’re looking … Read more

13 – 11 Other Attention Methods V2

Since the two main Attention papers were published in 2014 and ’15, Attention has been an active area of research with many developments. While the two mechanisms continue to be commonly used, there have been significant developments over the years. In this video, we will look at one of these developments published in a paper … Read more

12 – 10 Computer Vision Applications V3

In this concept, we’ll go over some of the computer vision applications and tasks that attention empowers. In the text below the video, we’ll link to a number of papers in case you want to go deeper into any specific application or task. In this video, we’ll focus on image captioning and one of the … Read more

11 – 09 Additive Attention V2

In this video, we’ll look at the third commonly used scoring method. It’s called concat, and the way to do it is to use a feedforward neural network. To take a simple example, let’s say we’re scoring this encoder hidden state, at the fourth time step at the decoder. Again this is an oversimplified example … Read more

10 – 08 Multiplicative Attention V2

Earlier in this lesson, we looked at how the key concept of attention is to calculate an attention weight vector, which is used to amplify the signal from the most relevant parts of the input sequence and in the same time, drown out the irrelevant parts. In this video, we’ll begin to look at the … Read more

1 – 01 Introduction To Attention V2

Hello, I’m Jay, and in this lesson, we’ll be talking about one of the most important innovations in deep learning in the last few years, Attention. Attention started out in the field of computer vision as an attempt to mimic human perception. This is a quote from a paper on Visual Attention from 2014. It … Read more