1 – M4L51 HSA Word Embeddings V3 RENDER V1

In this lesson, I want to talk a bit more about using neural networks for natural language processing. We’ll be discussing word embedding, which is the collective term for models that learned to map a set of words or phrases in a vocabulary to vectors of numerical values. These vectors are called embeddings, and we can use neural networks to learn to do word embedding. In general this technique is used to reduce the dimensionality of text data. But these embedding models can also learn some interesting traits about words in a vocabulary. In fact, we’ll focus on the Word2Vec embedding model. Which learns to map words to embeddings that contain semantic meaning. For example embeddings can learn the relationship between verbs in the present and past tense. The relationship between the embeddings for walking and walked, should be the same as the relationship between the embeddings for swimming and swam. Similarly embeddings can learn the relationships between words and common genders. Such as between woman and Queen and between man and King. You can think of these embeddings as vectors that have learned to mathematically represent the relationship between words in a vocabulary. A word of caution here. The embeddings are learned from a body of text and so any word associations in that source text will be replicated in the embeddings. If your text contains false information or gender biased associations. These traits will be replicated in your embeddings. In fact debiasing word embeddings is an active area of research and you can read more about it below. In this lesson we’ll first talk about how word embedding works in theory, then a walk through a series of notebooks in which you’ll learn to implement the Word2Vec model. Before we start coding, let’s learn more about how embeddings can reduce the dimensionality of text data.

%d 블로거가 이것을 좋아합니다: