9 – 9 Model Validation Loss V2

This is what we want our model to look like. It should take in some inputs and then put those through an embedding layer, which produces some embedded vectors that are sent to a final softmax output layer, and here’s my model definition. You can see that it’s a pretty simple model. First, I’m defining … Read more

8 – 8 Word2vec Model V2

Now that we’ve taken the time to preprocess and batch our data, it’s time to actually start building the network. Here, we can see the general structure of the network that we’re going to build. So, we have our inputs, which are going to be like batches of our train word tokens, and as we … Read more

7 – 7 Batching Data Solution V1

Here’s how I’m defining the context targets around a given word index. First, according to the excerpt from the paper, I’m going to define a range R. R is going to be a random integer in the range one to c, the window size. randint takes in a range that is not inclusive of the … Read more

6 – 6 Defining Context Targets V1

Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture for each word in the text, we want to define a surrounding context and grab all the words in a window around that word with size C. When … Read more

5 – 5 Subsampling Solution V1

Here is my solution for creating a new list of train words. First, I calculated the frequency of occurrence for each word in our vocabulary. So, I stored the total length of our text in a variable, total_count. Then, I created a dictionary of frequencies. For each word token and count in the word counter … Read more

4 – 4 Data Subsampling V1

Okay, let’s get started with implementing the skip-gram word2vec model. The first thing you want to do is load in the necessary data. In this example, I’m using a large body of text that was scraped from Wikipedia articles by Matt Mahoney. If you’re working locally, you’ll actually need to click this link to download … Read more

3 – 3 Word2Vec Notebook V2

So in this notebook, I’ll be leading you through a Word2Vec implementation in PyTorch. Now, you’ve just learned about the idea behind embeddings in general. For any dataset with lots of classes or input dimensions like a large word vocabulary, we’re basically skipping the one-hot encoding step, which would result in extremely long input vectors … Read more

2 – M4L52 HSA Embedding Weight Matrix V3 RENDER V2

We’ve talked a bit about how neural networks are designed to learn from numerical data. In our case, word embedding is really all about improving the ability of networks to learn from texted data. The idea is this, embeddings can greatly improve the ability of networks to learn from text data, by representing that data … Read more

12 – 12 CompleteModel CustomLoss V2

So, I ran all the cells in my notebook and here’s my solution and definition for the SkipGramNeg Module. First, I’ve defined my two embedding layers, in-embed and out-embed, and they’ll both take in the size of our word vocabulary and produce embeddings of size and embed. So, mapping from our vocab to our embedding … Read more

11 – 11 SkipGram Negative V1

All right. So, we have two tasks to complete, to define a more efficient Word2vec skip-gram model. Here, I’m calling this model skip-gram neg to include negative sampling. This model takes in our usual vocab and embedding dimension. It also takes in a noise distribution, if it’s provided. Okay. So, first, we want to define … Read more

10 – 10 NegativeSampling V1

Now, the last model took quite a while to train, and there are some ways that we can speed up this process. In this video, I’ll talk about one such method which is called negative sampling. So, this is a new notebook, but it contains basically the same info as our previous notebook including this … Read more

1 – M4L51 HSA Word Embeddings V3 RENDER V1

In this lesson, I want to talk a bit more about using neural networks for natural language processing. We’ll be discussing word embedding, which is the collective term for models that learned to map a set of words or phrases in a vocabulary to vectors of numerical values. These vectors are called embeddings, and we … Read more