In this lesson, you’ll learn how to read text data from different sources and prepare it for feature extraction. You’ll begin by cleaning it to remove irrelevant items, such as HTML tags. You will then normalize text by converting it into all lowercase, removing punctuations and extra spaces. Next, you will split the text into words or tokens and remove words that are too common, also known as stop words. Finally, you will learn how to identify different parts of speech, named entities, and convert words into canonical forms using stemming and lemmatization. After going through all these processing steps, your text may look very different, but it captures the essence of what was being conveyed in a form that is easier to work with.