HTML5 Parser: A Gentle Introduction

Daniel Davis, Opera Software

HTML5 Parser:
A Gentle Introduction

A cute lamb, representing 'gentle'

Browser support

  • Opera 11.60
  • Firefox 4
  • Chrome 7
  • Safari 5.1
  • IE 10
Photo: Mamipeko

So what is it?

  • It's a parser
  • It's for HTML5

OK, so what does it do?

Goes through an HTML5 document looking for markup and content.

This is not easy.

This is an understatement.

More technical description

The HTML5 parsing algorithm has two major parts: tokenization and tree building. Tokenization is the process of splitting the source stream into tags, text, comments, and attributes inside tags. The tree building phase takes the tags and the interleaving text and comments and builds the DOM tree.

Henri Sivonen, Mozilla

What's different about HTML5?

  • New elements & attributes
  • More detailed specification
  • Error handling rules
  • Inline SVG & MathML

New elements & attributes

New stuff in HTML5

  • <header> & <footer>, <article> & <section>, <nav>, etc.
  • <details> & <summary>
  • input types:
    • <input type="date">
    • <input type="color">

More detailed specification

If you fancy some light reading...

http://dev.w3.org/html5/spec/parsing.html#parsing

For example, the HTML5 outliner

<h1>I'm a heading</h1>
<section>
    <h1>I'm a sub-heading</h1>
</section>

Let the renderer do the hard work.

Error handling rules

Underlying principle

Humans make mistakes, computers don't (with one exception...)

Old-fashioned parser

<h1>Welcome to my site!
<p>Some exciting content

Parser says:

IS THAT WHAT YOU CALL HTML?!
You fail at the internetz.

New, shiny parser

<h1>Welcome to my site!
<p>Some exciting content

Parser says:

IS THAT WHAT YOU CALL HTML?!
OK then.

Inline SVG & MathML

Inline SVG & MathML

The coolest thing about the new HTML5 parser is the ability to use SVG and MathML inline in regular HTML.

Robert O'Callahan, Mozilla

Inline SVG

<svg xmlns="http://www.w3.org/2000/svg" width="240" height="240" viewBox="0 0 240 240"> ... </svg> image/svg+xml

Inline MathML

<math display="block"><mrow><mix></mi><mo>=</mo><mfrac><mrow><mo>−</mo><mi>b</mi><mo>±</mo><msqrt><mrow><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mi>a</mi></mrow></mfrac></mrow></math> x=b±b24ac2a

Benefits

Good for browsers because...

  • Consistent behaviour between browsers
  • All other browsers have it
  • Fewer site compatibility issues

Good for web developers because...

  • Consistent behaviour between browsers
  • More forgiving of mistakes
  • Content is more findable and re-usable

Good for users because...

  • Consistent behaviour between browsers
  • Improved behaviour of screenreaders, search engines, etc.
  • Faster browsing in some cases

In summary

OMG, I love HTML5 parsers!!!

Some guy on the internet (probably)