2 – M5 SC 16 HTML Structure V1

Hello and welcome back. Before we start working with Beautiful Soup, let’s take a quick look at how HTML works. Now, you don’t have to be an HTML expert in order to use Beautiful Soup, but it’s definitely important to know the basic workings of HTML. HTML stands for Hypertext Markup Language, and it is the standard markup language for creating web pages. Basically, what HTML does is to describe the structure of web pages using elements that are represented by tags, such as Head, Title, and so on. All the information displayed on a web page is contained within this tags. What web browsers, such as Chrome or Firefox, actually do is to take the information contained within the HTML file and render it to produce a web page like we see here. As we can see, the web browser doesn’t actually display the HTML tags, but rather uses them to determine how to display the information contained within them. So let’s see how this works in more detail. So here, we have a very simple web page. As we can see, it contains some headers, it also has some plain text, it also has an external link. We can also see that it has a blue background, and that this headers have an orange background. So let’s take a detailed look at the HTML file that created this web page. Here we have the entire HTML document that produced this web page. The first thing we notice is that we have all these tags throughout our HTML file. These HTML tags normally come in pairs, as we can see here with title and title. The first tag in the pair is called the opening tag, and the second one is called the closing tag. The closing tag is written just like the opening tag but with a forward slash before the tag name as we can see here. Now, let’s quickly go through this HTML file to see what each tag does. At the beginning of the document, we see the doctype HTML declaration. This indicates that this document is using HTML5 which is the latest version of the HTML standard. This declaration must appear only once at the top of the page before any HTML tags. Now, the second tag is the HTML tag. This tag is the root tag of the HTML file, and we can see that it has an attribute, lang equals en-US. This lang attribute specifies that the language of this page is US English. We should note that all HTML tags can have attributes, and that these attributes provide additional information about the HTML tags. The head tag contains information about the document such as a title tag that specifies the title of the page. In this case, it is AI for trading, and we can see it displayed right here. The head tag also contains the meta tag. This tag provides metadata about the HTML document, and in this case, it is specifying that is using the UTF-8 characters set. Web browsers must know which character set to use in order to display the HTML page correctly. Notice that this meta tag has no closing tag, because in HTML, the meta tag has no closing tag. We should also note that meta tags always go inside the head tag. Our head tag also contains this link tag. In our case, this link tag links our page to a CSS style sheet that is contained in a separate file named teststyle.css. It is this style sheet that determines that the background of our web page should be blue. Similarly, this style tag determines the style of our h2 headings. For example, here we can see that it sets the background color of our h2 headings to tomato which is displayed as this orange color right here. The body tag determines what will be displayed in the web page itself. Only the contents in the body tag will be displayed in a browser. Let’s take a look at the tags inside the body tag. The h1 tag defines a large heading and it is set to get help from peers and mentors, and we can see it displayed right here. The div tag represents a division or a section in an HTML document. The h2 tag represents a slightly smaller heading than the h1 heading, and we can see them right here, student hub and knowledge. The P tag represents a paragraph, and we can see one of them right here. The HR tag displays a horizontal line, and in HTML it has no end tag. Finally, the A tag defines a hyperlink and it is used to link one web page with another. The HREF attribute of the A tag indicates the links destination. So in this case, this hyperlink will show up in the web page as knowledge as we can see here, and if we click on this link, it will take us to knowledge.udacity.com. Throughout the following lessons, we will use this very simple HTML file to learn how to use Beautiful Soup to scrape data from websites.

%d 블로거가 이것을 좋아합니다: