9 – Feature Extraction

Okay. We now have clean normalized text. Can we feed this into a statistical or a machine learning model? Not quite. Let’s see why. Text data is represented on modern computers using an encoding such as ASCII or Unicode that maps every character to a number. Computer store and transmit these values as binary, zeros … Read more

8 – Text Processing

Let’s take a closer look at text processing. The first question that comes to mind is, why do we need to process text? Why can we not feed it in directly? To understand that, think about where we get this text to begin with. Websites are a common source of textual information. Here’s a portion … Read more

7 – NLP M1-L1 01 NLP Pipeline

Let’s look at a common NLP pipeline. It consists of three stages, text processing, feature extraction and modeling. Each stage transforms text in some way and produces a result that the next stage needs. For example, the goal of text processing is to take raw input text, clean it, normalize it, and convert it into … Read more

6 – Natural Language Processing

Natural language processing is one of the fastest growing fields in the world. NLP Is making its way into a number of products and services that we use every day. Let’s begin with an overview of how to design an end-to-end NLP pipeline. Not that kind of pipeline; a natural language processing pipeline, where you … Read more

5 – Context

So what is stopping computers from becoming as capable as humans in understanding natural language? Part of the problem lies in the variability and complexity of our sentences. Consider this excerpt from a movie review. “I was lured to see this on the promise of a smart witty slice of old fashioned fun and intrigue. … Read more

4 – Unstructured Text

The languages we use to communicate with each other also have defined grammatical rules. And indeed, in some situations we use simple structured sentences but for the most part human discourse is complex and unstructured. Despite that, we seem to be really good at understanding each other and even ambiguities are welcome to a certain … Read more

3 – Grammar

Structured languages are easy to parse and understand for computers because they are defined by a strict set of rules or grammar. There are standard forms of expressing such grammars and algorithms, that can parse properly formed statements to understand exactly what is meant. When a statement doesn’t match the prescribed grammar, a typical computer … Read more

2 – Structured Languages

What makes it so hard for computers to understand us? One drawback of human languages, or feature depending on how you look at it, is the lack of a precisely defined structure. To understand how that makes things difficult let’s first take a look at some languages that are more structured. Mathematics, for instance, uses … Read more

10 – Modeling

The final stage in this process is what I like to call modeling. This includes designing a model, usually a statistical or a machine learning model, fitting its parameters to training data using an optimization procedure, and then using it to make predictions about unseen data. The nice thing about working with numerical features is … Read more

1 – Welcome to NLP

Welcome to Natural Language Processing. Language is an important medium for human communication. It allows us to convey information, express our ideas, and give instructions to others. Some philosophers argue that it enables us to form complex thoughts and reason about them. It may turn out to be a critical component of human intelligence. Now … Read more