11 – 11 SkipGram Negative V1

All right. So, we have two tasks to complete, to define a more efficient Word2vec skip-gram model. Here, I’m calling this model skip-gram neg to include negative sampling. This model takes in our usual vocab and embedding dimension. It also takes in a noise distribution, if it’s provided. Okay. So, first, we want to define two embedding layers, one for input and one for output words. Here, I’m calling those in_embed and out_embed. I want you to define these layers, such that they can accept an input or output target as input and return an embedding that’s a vector of dimension in_embed. I’ll also suggest that you initialize the weights of these layers using a uniform distribution between negative one and one. Now, let’s look at our loss function for a moment. When we think about defining a negative sampling loss, we know that this loss will take in a few things as input. It will for sure take in our input word embedding, vwi. It will also take in our correct output word embedding uw0 and several noisy incorrect embeddings uwi. So, in this model definition, I’m actually going to ask you to define three different forward functions for creating these embeddings. The first forward input should return our input embeddings, which are just going to be our input words passed through our input embedding layer. Similarly, forward output, which should return output vectors for passed and output words. Finally, a forward noise function, this one is special. It takes in a batch size and a number of noise samples to generate for performing negative sampling. This function first gets noisy samples from a passed in noise distribution. If no distribution is passed in, this will default to uniform distribution. Now, it gets a sample of noise words using torch.multinomial and gets batch size times n_samples of values. In this line, those words are being moved to a GPU, if available, and what you need to do to complete this function is pass these words through the output embedding layer to get their respective embeddings. So, you get our noise embeddings and then you should reshape these embeddings to be batch size by n_samples, by n_embed in dimension. All right. So, complete these forward functions, making sure to return correct embeddings for each forward function. If you’ve completed this implementation, you should be able to proceed with training this model. Next, I’ll go over one solution for this model and I’ll show you how I defined a custom negative sampling loss.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading