9 – M5 SC 9 Substitutions And Flags V1

Hello, and welcome back. In this lesson, we will learn how to use the re module to modify strings. Regex objects have the sub method that allow us to replace patterns within a string. So, let’s see how this works with a simple example. Here, we have a multi-line string that contains two instances of the ampersand character. Let’s suppose we wanted to replace this ampersand characters with the word, and. To do this, we first need to create a regular expression that matches all the ampersand characters in our multi-line string. We then need to save that regular expression into a regex object. Finally, which is a sub method to replace our matches. The way this method works, is that it replaces every match of the regular expression in our sample text with a raw string, and. So, if we run this code, we can see here the original text, and here the modified text without the ampersand replaced with the word, and. Being able to make these substitutions can be really useful and save you a lot of time if you’re working with large documents that you need to reformat We can do more sophisticated substitutions by using groups. Let’s see an example. Here, we have a multiline string with the names of four people. As you can see, some people have middle names but others do not. Let’s assume we want it to replace each name with just its first and last name. For example, the name John David Smith should be replaced with just John Smith, and the name Alice Jackson should stay the same. To do this, we first need to create a regular expression that matches all the names in the list. Now, keep it in mind that we need to be able to make substitutions later. We will use groups to be able to distinguish between the first name, the middle name, and the last name. Since all names have a first name, then we can use this group to match all the first names. Remember, the plus meta-character matches one or more repetitions of the preceding regular expression. Now, not all names have middle names, so having a middle name is optional. Since the first and middle names are separated by white space, we also need to indicate that this white-space is also optional. So, to indicate that the white-space and middle name are optional, we will use the question mark meta-character after the white-space and after the second group representing the middle name. After the first or middle name, we have a white-space character that we can match with this character set. Notice that in this case, we didn’t use the sequence backslash s, since this will match new lines as well, and we don’t want to match those. Finally, we make a third group to match the last name. Since all names have last names, we don’t need to use the question mark meta-character. So, if we run this code, we can clearly see that we can match the four names in our list. Now, the cool thing about using groups is that we can reference them individually from the match object by using the group method. The group n method selects the n group in each match. Therefore, in this particular case for each match, group one will select the first name, group two will select the middle name, and group three will select the last name. So here, we’ve created a loop that goes through each match, and for each match, would use the group method to selectively print out the first name using group one, the middle name using group two, and the last name using group three. If we run this code, we can see each person’s first, middle, and last name. Now, we’re ready to use the sub method to make our substitutions. What we want to do is to replace every match with only the first and last names or equivalently replace every match with the first and third groups. We can refer to each group in the sub method by using the backslash. For example, this backslash one refers to the first group or first name, and these backslash three refers to the third group or last name. So, this statement replaces every match of our regular expression in the sample text with groups one and three, which correspond to the first and last names. So, if we run this code, we can see that we have successfully replaced every name with the first and last name. Now, the last topic we’re going to cover is Flags. We saw at the beginning of this lesson, that regular expressions are case sensitive. Therefore, we often have to use regular expressions with both uppercase and lowercase letters. However, the compile function has flags that can be used to allow for more flexibility. For example, the IGNORECASE flag can be used to perform case insensitive matching. Let’s see an example. Here, we have a string that contains the name Walter written in different combinations of upper and lowercase letters. In order to be able to find these two renditions of Walter, we will probably have to use a long character set to be able to account for all possible combinations of lower and uppercase letters. However, in this case, we can use the IGNORECASE flag to indicate that we don’t care about the case of the letters, We just want to find the name Walter no matter how it’s written. So, if we run this code, we can clearly see that we are able to match both renditions of Walter without any fancy regular expression. We have seen a lot in this lesson, and we have just begun to scratch the surface of regular expressions. For more information on Regexes, make sure to check out the Python Regex documentation.

%d 블로거가 이것을 좋아합니다: