9 – M5 SC 5 Word Boundaries V1

We will now learn about another special sequence that you can create using the backslash, namely backslash b. The special sequence doesn’t really match a particular set of characters, but rather determines word boundaries. A word in this context is defined as a sequence of alphanumeric characters, while a boundary is defined as a whitespace, … Read more

8 – M5 SC 4 Searching For Simple Patte V1

In the previous lessons, we saw how we can match letters, words, and metacharacters. In this notebook, we will see how we can use regular expressions to perform more complex pattern matching using metacharacters. The first metacharacter we’re going to look at is the backslash. We already saw that the backslash can be used to … Read more

7 – M5 SC 3 Finding Metacharacters V1

Now, let’s try to use a regular expression to find this period at the end of our sentence. So, let’s use a period as a regular expression and let’s run this code just as we did before. We can see that something has gone wrong. The finditer method has matched every single character in our … Read more

6 – M5 SC 2 Finding Words V1

In this notebook, you will learn how to find letters and words in a string using regular expressions. Throughout this lessons, we will use the re module from Python’s standard library to work with regular expressions. The re module not only contains functions that allow us to check if a given regular expression matches a … Read more

5 – M5 SC 1 Raw Strings V1

Hello and welcome. The following lessons, we will learn how to create basic regular expressions in Python. As mentioned in the previous lesson, regular expressions or regexes as for short, allows us search for patterns of text in documents. But before we dive in and start creating our regular expressions, let’s take a quick look … Read more

4 – AIT M5L4B 01 Introduction To Regex V4

Now that you know what 10-Ks are and how to access them, let’s take a look at how we can extract useful information from them. People often read 10-Ks to find out if a company is doing well or not. Certain words or phrases, both positive or negative might indicate how well a company is … Read more

3 – M5 SC 15 10Ks Walkthrough V1

Hello and welcome. In this lesson, we will learn how to navigate the SEC website and how to search the EDGAR database for the desired 10-K documents. We begin by going to the US Securities and Exchange Commission website, sec.gov. Now, to access the EDGAR database, we can go to the Filings menu and select … Read more

20 – M5 SC 14 Searching The Parse Tree Part 3 V1

Hello, and welcome back. In this notebook, we will take a look at the recursive argument in the FindAll method. But in order to understand how the recursive argument works, we must first take a look at some basic properties of child tags. So, let’s get started. For simplicity, in the following examples, we will … Read more

2 – AIT M5L4A 02 Financial Statement V6

For long-term investments, there is a lot more information about a company besides basic trading data. Things like market share, products and services, and growth potential can be important factors for long-term trends. Where could we get this information potentially? In financial reports. In the United States, we have the Security and Exchange Commission, the … Read more

19 – M5 SC 13 Searching The Parse Tree Part 2 V1

Hello and welcome back. In this notebook, we will see how to search the parse tree using the class attribute and regular expressions. So, let’s begin by looking at the class attribute. Let’s suppose we wanted to find all the tags that had the attribute class equals “h2style.” Unfortunately, in this case, we can’t simply … Read more

18 – M5 SC 12 Searching The Parse Tree Part 1 V1

Hello and welcome back. In this notebook, we will begin to explore how to search the parse tree created by BeautifulSoup. Now, BeautifulSoup provides a number of methods for searching the tree, but we will only cover the find all method in these lessons. If you’re interested, you can learn about other search methods in … Read more

17 – M5 SC 11 Navigating The Parse Tree V1

Hello and welcome back. In this notebook, we will learn how to navigate the parse tree created by BeautifulSoup. So the most straightforward way of navigating the tree is by accessing the HTML or XML tags. We can access the tags as if they were attributes of the BeautifulSoup object as shown here. So let’s … Read more

16 – M5 SC 10 Parsing An HTML File V1

Hello and welcome back. In this notebook, we will see how to parse an HTML file. In order to parse an HTML file, we need to parse the file into the BeautifulSoup constructor. We can parse our file to the BeautifulSoup constructor either as a string or as an open file handle as we have … Read more

15 – M5 SC 16 HTML Structure V1

Hello and welcome back. Before we start working with Beautiful Soup, let’s take a quick look at how HTML works. Now, you don’t have to be an HTML expert in order to use Beautiful Soup, but it’s definitely important to know the basic workings of HTML. HTML stands for Hypertext Markup Language, and it is … Read more

14 – AIT M5L4B 06 Introduction To Beautifulsoup V3

In the previous lessons, you learned how to create regular expressions and use them to find a specific patterns of text in documents. In some cases however, the text you want to analyze maybe already formatted as a website rather than in a plain text document. In principle, you could say the HTML contents of … Read more

13 – M5 SC 9 Substitutions And Flags V1

Hello, and welcome back. In this lesson, we will learn how to use the re module to modify strings. Regex objects have the sub method that allow us to replace patterns within a string. So, let’s see how this works with a simple example. Here, we have a multi-line string that contains two instances of … Read more

12 – M5 SC 8 Metacharacters Part 3 V1

Hello, and welcome back. In this notebook, we will learn how to make more complicated regular expressions using groups, the question mark, and the asterisk metacharacters. Let’s see how they work. Here, we have a multiline string with the names and the heights of the four highest mountains in the world according to Wikipedia. Our … Read more

11 – M5 SC 7 Metacharacters Part 2 V1

In this notebook, we will see a practical application of regular expressions. In particular, we will use the special sequence backslash d to create regular expressions that will allow us to look for phone numbers. We will also learn about character sets and how they can be used to search for more complicated patterns of … Read more

10 – M5 SC 6 Metacharacters Part 1 V1

In the previous lessons, you learned how to use the backslash meta character to create special sequences. We will now look at the following metacharacters. The dot, the caret, and the dollar sign. Let’s start by looking at the dot. As we saw before, the dot matches any character except newline characters. Let’s see an … Read more