3 – M5 SC 10 Parsing An HTML File V1

Hello and welcome back. In this notebook, we will see how to parse an HTML file. In order to parse an HTML file, we need to parse the file into the BeautifulSoup constructor. We can parse our file to the BeautifulSoup constructor either as a string or as an open file handle as we have done here. The BeautifulSoup constructor will return a BeautifulSoup object. These BeautifulSoup object represents the document as a whole and can be searched using various methods. We will see how to do this in the following lessons. For now, let’s print the BeautifulSoup object to see what it looks like. As we can see, the BeautifulSoup object holds the entire contents of our sample HTML file. However, it is not printed in a nice format and it is very hard to read. Luckily, the BeautifulSoup object has the “prettify” method that allows the BeautifulSoup object to be printed with all the tags nicely indented. In the next lesson, we will see how to access the information contained in each tag.

%d 블로거가 이것을 좋아합니다: