9 – 08 Video Captioning V1

This captioning network can also be applied to video, not just single images. In the case of video captioning, the only thing that has to change about this network architecture is the feature extraction step that occurs between the CNN and the RNN. The input to the pre-trained CNN will be a short video clip … Read more

8 – 07 RNN Training V4

Let’s take a closer look at how the decoder trains on a given caption. The decoder will be made of LSTM cells, which are good at remembering lengthy sequence of words. Each LSTM cell is expecting to see the same shape of the input vector at each time-step. The very first cell is connected to … Read more

7 – Tokenization

Token is a fancy term for a symbol. Usually, one that holds some meaning and is not typically split up any further. In case of natural language processing, our tokens are usually individual words. So tokenization is simply splitting each sentence into a sequence of words. The simplest way to do this is using the … Read more

6 – 06 Tokenizing Captions V3

The RNN component of the captioning network is trained on the captions in the COCO dataset. We’re aiming to train the RNN to predict the next word of a sentence based on previous words. But, how exactly can it train on string data? Neural networks do not do well with strings. They need a well-defined … Read more

3 – 03 Captions And The COCO Dataset V3

The first thing to know about image captioning model is how it will train. Your model will learn from a dataset composed of images, pair with captions that describe the content of those images. Say you’re asked to write a caption that describe this image, how would you approach this task? First, you might look … Read more

2 – 02 Leveraging Neural Networks V3

Hi, I’m Calvin Lin and I work for Deep Learning Institute at NVIDIA, where we’re tasked with enabling everyone in the industry to create AI companies. For that, we need very capable AI and image captioning is where we go from mere perception modules to ones with generative capabilities. A captioning model rely on two … Read more

10 – 09 On To The Project V2

Now, that you’ve learned about the structure of an automatic captioning system, your next task will be to apply what you’ve learned to build and train your own captioning network. You’ll be provided with some data pre-processing steps, and your main tasks will be about deciding how to train the RNN portion of the model. … Read more

1 – 01 L Introduction V3

We looked at Convolutional Neural Networks that are used for image classification and object localization. And we looked at Recurrent Neural Networks mostly in the context of text generation. You’ve seen how networks like LSTMs can learn from sequential data, like a series of words or characters. These networks use hidden layers that over time … Read more

9 – 07 Additive And Multiplicative Attention V1

Before delving into the details of scoring functions, we need to make a distinction of the two major types of attention. These are often referred to as “Additive Attention and Multiplicative Attention.” Sometimes they’re also called “Bahdanau Attention” and “Luong Attention,” referring to the first authors of the papers, which described them. Bahdanau attention refers … Read more

8 – 06 Attention Decoder V1

Let’s now look at things on the decoder side. In models without attention, we’d only feed the last context vector to the decoder RNN, in addition to the embedding of the end token, and it will begin to generate an element of the output sequence at each time-step. The case is different in an attention … Read more

7 – 05 Attention Encoder V2

Now that we’ve taken a high level look at how attention works in a sequence to sequence model, let’s look into it in more detail. We’ll use machine translation as the example as that’s the application the main papers on attention tackled. But whatever we do here, translates into other applications as well. It’s important … Read more

6 – 04 Attention Overview Decoding V2

Now, let’s look at the attention decoder and how it works at a very high level. At every time step, an attention decoder pays attention to the appropriate part of the input sequence using the context factor. How does the attention decoder know which of the parts of the input sequence to focus on at … Read more

5 – 03 Attention Overview Encoding V2

A Sequence to Sequence Model with attention works in the following way. First, the encoder processes the input sequence just like the model without attention one word at a time, producing a hidden state and using that hidden state and the next step. Next, the model passes a context vector to the decoder but unlike … Read more

4 – 02 Sequence To Sequence Recap V2

Welcome back. In this video, we’ll briefly recap how sequence to sequence models work. A sequence to sequence model takes in an input that is a sequence of items, and then it produces another sequence of items as an output. In a machine translation application, the input sequence is a series of words in one … Read more

3 – Architecture encoder decoder

Let’s look more closely at how sequence to sequence models work. We’ll start with a high level look and then go deeper and deeper. Here are our two recurrent nets. The one on the left is called the encoder. It reads the input sequence, then hands over what it has understood to the RNN and … Read more

2 – Applications seq2seq

I do want to say a couple of words on applications before delving deeper into the concept. That’s because the term sequence-to-sequence RNN is a little bit abstract and doesn’t relay how many amazing things we can do with this type of model. So let’s think of it like this. We have a model that … Read more

14 – 12 The Transformer And Self Attention V2

Let’s look at how self-attention works in a little bit more detail. Let’s say, we have these words that we want our encoder to read and create a representation of. As always, we begin by embedding them into vectors. Since the transformer gives us a lot of flexibility for parallelization, this example assumes we’re looking … Read more

13 – 11 Other Attention Methods V2

Since the two main Attention papers were published in 2014 and ’15, Attention has been an active area of research with many developments. While the two mechanisms continue to be commonly used, there have been significant developments over the years. In this video, we will look at one of these developments published in a paper … Read more