2 – Capturing Text Data

The processing stage begins with reading text data. Depending on your application, that can be from one of several sources. The simplest source is a plain text file on your local machine. We can read it in using Python’s built in file input mechanism. Text data may also be included as part of a larger database or table. Here, we have a CSV file containing information about some news articles. We can read this in using pandas very easily. Pandas includes several useful string manipulation methods that can be applied to an entire column at once. For instance, converting all values to lowercase. Sometimes, you may have to fetch data from an online resource, such as a web service or API. In this example, we use the requests library in Python to obtain a quote of the day from a simple API, but you could also obtain tweets, reviews, comments, whatever you would like to analyze. Most APIs return JSON or XML data, so you need to be aware of the structure in order to pull out the fields that you need. Many data sets you will encounter have likely been fetched and prepared by someone else using a similar procedure.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading