2 – M5 SC 2 Finding Words V1

In this notebook, you will learn how to find letters and words in a string using regular expressions. Throughout this lessons, we will use the re module from Python’s standard library to work with regular expressions. The re module not only contains functions that allow us to check if a given regular expression matches a particular string, but also contains functions that allow us to modify strings in various ways. Let’s begin by learning how to use a regular expression to find all the locations of the letter a in this sentence, Alice and Walter are walking to the store. So in this example, our regular expression is just going to be the letter a. So we begin by creating a raw string that contains our regular expression, in this case just the letter a. We’re going to pass that to the compile function of the re module. This function will convert this raw string into a regular expression object which we’ll save in this regex variable. Once we have a regular expression object, we can search for the regular expression in our sample text by using the finditer method. Now, I should mention that the re module contains various matching methods. However, we’ll only be using the finditer method throughout these lessons because it is very fast and allows us to get useful information as well, as we will see in a minute. Now, the finditer method returns an iterator. This means that we can loop through it to print other matches. This is what we have done here. So if we run this code, we can see each match found by the finditer method. We can see that each match has a span. The span just indicates the start and end indices of the corresponding matches. For example, if we look at the first match, we see that this span goes from indices six through seven. Therefore, it will print indices six through seven of the sample text. We can see that it corresponds to an a as it should be. Notice however, that even though the first letter in our sample text is an a, the finditer method didn’t return a match for it. This is because regular expressions are case-sensitive. Therefore, in order to match this uppercase a at the beginning of our sentence, we need to use an uppercase a as a regular expression. So, if we run this code, notice that now the finditer method only returned one match because there is only one uppercase a in our sample text. In the same way, we can also search for whole words in our sentence. Here for example, we’re searching for the word walking in our sample texts. When we run this code, we see that we only get one match corresponding to the word walking, and that the span of the match is between indices 21 and 28 of the sample text. Therefore, if we print the sample text between those span indices, we see that we indeed get the word walking. One thing to keep in mind, is that when searching for words or groups of letters, the order of the letters matters. For example, if we were to use this regular expression, we wouldn’t find any matches, even though, the same group of letters are contained in the word walking. At this point, you might be thinking, that regular expressions work just like a literal search, so that there’s nothing special about them. You’re right. However, in the next section, we will begin learning about meta-characters. Later, in this lesson, you will learn how to combine them to create more complicated regular expressions, that allow us to find complex patterns of text.

%d 블로거가 이것을 좋아합니다: