Introduction to HTML
- To create a simple HTML page.
- About HTML elements and attributes.
- The difference between HTML and XHTML.
- To create the skeleton of an HTML document.
- About whitespace and HTML.
- To output special characters in HTML.
HyperText Markup Language (HTML) is the language behind most Web pages. The language is made up of elements that describe the structure and format of the content on a Web page.
HTML is maintained by the World Wide Web Consortium (W3C). As of this writing, the latest versions are HTML 4.01 and XHTML 1.0. See http://www.w3.org/TR/html4/ and http://www.w3.org/TR/xhtml1/ for the specifications. In this lesson, we will address the differences between HTML and XHTML. The rest of the course covers XHTML, but for simplicity, we’ll usually refer to it as HTML.
We’ll begin with a simple exercise.
Exercise: A Simple HTML Document
In this exercise, you will create your first HTML document by simply copying the text shown below. The purpose is to give you some sense of the structure of an HTML document.
- Open a simple text editor such as Notepad and create a new file. Do not use an HTML editor for this exercise.
- Save the file as HelloWorld.html in the HTMLBasics/Exercises folder.
- Type the following exactly as shown:
<html> <head> <title>Hello world!</title> </head> <body> <h1>Hello world!</h1> </body> </html>
- Save the file again and then open it in your browser by navigating to the file in your folder system and double-clicking on it. The page should appear as follows:
The HTML Skeleton
At its simplest, an HTML page contains what can be thought of as a skeleton – the main structure of the page. It looks like this:
Code Sample: HTMLBasics/Demos/Skeleton.html
<html> <head> <title></title> </head> <body> <!--Content that appears on the page--> </body> </html>
The <head> Element
The <head> element contains content that is not displayed on the page itself. Some of the elements commonly found in the <head> are:
- Title of the page (<title>). Browsers typically show the title in the “title bar” at the top of the browser window.
- Meta tags, which contain descriptive information about the page (<meta />)
- Style blocks, which contain Cascading Style Sheet rules (<style>).
- References (or links) to external style sheets (<link />).
The <body> Element
The <body> element contains all of the content that appears on the page itself. Body tags will be covered thoroughly throughout this manual.
Extra whitespace is ignored in HTML. This means that all hard returns, tabs and multiple spaces are condensed into a single space for display purposes.
Code Sample: HTMLBasics/Demos/Whitespace.html
<html> <head> <title>Whitespace Example</title> </head> <body> This is a sentence on a single line. This is a sentence with extra whitespace throughout. </body> </html>
The two sentences in the code above will be rendered in exactly the same way.
HTML elements describe the structure and content of a Web page. Tags are used to indicate the beginning and end of elements. The syntax is as follows:
Tags often have attributes for further defining the element. Attributes come in name-value pairs
This is not allowed in XHTML.
Note that attributes only appear in the open tag, like so:
<tagname att1="value" att2="value">Element content</tagname>
The order of attributes is not important.
Empty vs. Container Tags
The tags shown above are called container tags because they have both an open and close tag with content contained between them. Tags that do not contain content are called empty tags. The syntax is as follows:
<tagname att1="value" att2="value" />
Blocks and Inline Elements
Block elements are elements that separate a block of content. For example, a paragraph (<p>) element is a block element. Other block elements include:
- Lists (<ul> and <ol>)
- Tables (<table>)
- Forms (<form>)
- Divs (<div>)
Inline elements are elements that affect only snippets of content and do not block off a section of a page. Examples of inline elements include:
- Links (<a>)
- Images (<img>)
- Formatting tags (<b>, <i>, <tt>, etc.)
- Phrase elements (<em>, <strong>, <code>, etc.)
- Spans (<span>)
Important: Inline elements cannot be direct children of the body element. They must be contained within a block-level element.
Comments are generally used for one of three purposes.
- To write helpful notes about the code; for example, why something is written in a specific way.
- To comment out some code that is not currently needed, but may be used sometime in the future.
- To debug a page.
HTML comments are enclosed in <!– and –>. For example:
<!-- This is an HTML comment -->
XHTML vs. HTML
XHTML 1.0 and HTML 4.0 consist of the same sets of elements. The only difference is that HTML is fairly flexible; whereas, XHTML has strict rules.
The DOCTYPE declaration goes at the beginning of the document and is used to indicate which version of (X)HTML the page uses. There are three versions of (X)HTML documents: strict, frameset and transitional (loose). In HTML, the DOCTYPE declaration is optional. In XHTML, it is required.
XHTML Strict <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> HTML Strict <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
The transitional (or loose) versions of HTML and XHTML allow for the use of deprecated tags and attributes. The transitional versions also do not support framesets.
XHTML Transitional <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> HTML Transitional <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
The frameset versions of HTML and XHTML are the same as the transitional versions, except that they also support frames.
XHTML Frameset <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> HTML Frameset <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
HTML 4.0 allows some closing tags to be omitted. For example, in HTML, list item (<li>) tags do not require a matching close tag (</li>).
In XHTML, all tags must be closed. Empty tags are closed by adding a forward slash before the final angle bracket of the tag:
<tagname att1="value" att2="value" />
Note the space before the forward slash. Though this is not required by XHTML, it may help older browsers from getting confused.
In HTML 4.0, the forward slash is not required:
<tagname att1="value" att2="value">
In HTML, case is not important. In XHTML, all tags and attributes must be in lowercase letters.
In HTML, attribute values do not always have to be in quotes; whereas, in XHTML quotes are required. Either single quotes or double quotes may be used.
In both HTML and XHTML, tags should be nested properly. Proper nesting requires nested tags to be closed in reverse order from which they were opened. Another way to say this is that each element must be completely contained by its parent element. For example, the following line of code uses improper nesting:
The corrected line looks like this:
Some XML Stuff
The XML Declaration
XHTML documents are, by definition, XML documents. This means that they follow the rules of XML. Although not required, it is good practice to include an XML declaration in your XHTML documents. If included, the XML declaration must be at the very beginning of the document. The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8"?>
For best results, it is best to define the encoding in a meta tag as well. We’ll cover meta tags later in the manual. For now, note that you should include the following tag within the head tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The XHTML Namespace
In XHTML documents, the html tag must contain an xmlns declaration for the XHTML namespace, which indicates that the document must conform to the rules defined in the XHTML namespace. The syntax is shown below:
Special characters (i.e, characters that do not show up on your keyboard) can be added to HTML pages using entity names and numbers. For example, a copyright symbol (Â©) can be added using © or ©. The following table shows some of the more common character references.
lang and xml:lang
The lang and xml:lang attributes are used to tell the browser (or other user agent) the language contained within an element. The W3C recommends that both lang and xml:lang be included in the html tag of all XHTML documents, like so:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
According to the W3C, these attributes may be helpful in:
- Assisting search engines
- Assisting speech synthesizers
- Helping a user agent select glyph variants for high quality typography
- Helping a user agent choose a set of quotation marks
- Helping a user agent make decisions about hyphenation, ligatures, and spacing
- Assisting spell checkers and grammar checkers
Introduction to HTML Conclusion
In this lesson of the HTML tutorial, you have learned the basics of HTML. You should understand how an HTML page is structured, know the major differences between HTML and XHTML and understand the basic syntax of HTML tags.