4 – AIT M5L4B 01 Introduction To Regex V4

Now that you know what 10-Ks are and how to access them, let’s take a look at how we can extract useful information from them. People often read 10-Ks to find out if a company is doing well or not. Certain words or phrases, both positive or negative might indicate how well a company is doing. For example, if one spots words like bankruptcy or phrases like declining profits, this can be signs of potentially running a high risk if one were to invest in that company. On the other hand, phrases like increasing profits might indicate that the company is doing well and could potentially have favorable future performance. Going through the 10-Ks of each company and reading each section to look for a particular keywords or phrases is an important task, that fundamental discretionary investors perform to gain deeper insights into each company But this can be very time consuming if you’re dealing with hundreds or even thousands of companies. In these type of situations we can employ regular expressions to do the work for us. A regular expression or regex for short, allows us to search for patterns in text in a fast and automated manner. Lets see a quick example of how one might use regular expressions and how they differ from a literal search. Suppose we wanted to search for all the phone numbers that appear in a particular document. A literal search involves listing out every possible combination in a set and then comparing each item in the document to each item in that set. In other words, a literal search only looks for exactly what you typed. For example, the literal search 555-123-4567 can only search the document for this particular number and none other. This means that a literal search won’t help us because we will need to know all the phone numbers in advance in order to search every possible phone number. Now let’s see how regular expressions can help us get around this problem. This regular expression that you see on the screen can be used to find all the phone numbers that match this particular pattern. In this expression, the backslash D matches any single digit and the dot matches any single character. We created this regular expression by taking advantage of the fact that even though phone numbers all have different digits, they all share the same pattern. Namely, three digits followed by a character, then followed by three more digits followed by another character and then followed by four or more digits. As you can start to see, regexes can be very powerful. In fact, you will soon learn the basic tools you need to create regular expressions that can match any pattern of texts that you can imagine. In the following lessons, you will learn how to create regular expressions in Python to match different patterns of text. We will then create Regexes to find specific patterns of text in 10-Ks. In a later lesson, we will learn how to analyze those words using natural language processing to determine how well a company has been doing from there 10-Ks. So let’s get started.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading