2008-02-03
HTML 5 vs. XHTML 2.0
- Why care about the core language of the World Wide Web?
- Rivalling standards in the 1990's
- HTML 4 and CSS level 1
- XHTML 1
- XHTML 1.1 — a minor improvement
- XHMTL had little impact in the web community
- XHTML 2.0 off the ground
- HTML 5 — core code as it is and should be
- Leave all appearance to CSS
- Benefits in short
- Browser manufacturers were prepared
On January 22, 2008, a working group under the World Wide Web Consortium, the internet standardization organisation, published its first public document about HTML 5, a new standard to write web pages. Reading it makes it clear that it is going to rival the other upcoming standard, XHTML 2.0. I'll vote for HTML 5 with its XML variety XHTML 5.
Why care about the core language of the World Wide Web?
When you edit web sites as a beginner, you usually don't care much about the underlying code; making the whole site work from a content point of view, and making all the links work properly is the main concern. If you, honoured reader, have come no further, the rest of this page may not really appeal to you. But you should be interested. The development of standards is in the concern of all contributors to the world wide web because it influences the way you should write your data and metadata in a broader sense.
Rivalling standards in the 1990's
Rivalry of web page encoding was the chief reason for the very foundation of the World Wide Web consortium; it was made to settle the conflicting norms of web coding that were a consequence of the browser war down in the 1990's (about 1994-1999). The underlying code, named HTML for HyperText Markup Language, was a part of the SGML specification. SGML was an open standard and open to amendments. Netscape Navigator, which was the HTML parser (web browser) of most popularity in the beginning of the 1990's, had by itself introduced a number of tags as well as a scripting language for minor programs in 1995, named JavaScript. Once Microsoft in 1994 introduced its own web browser, Internet Explorer, even more tags were introduced, and made its own version of JavaScript called VBScript.
Microsoft — the naughty boy in the classroom
Microsoft's behaviour during the browser war is symptomatic for the company's policies in general when it comes to standards. Microsoft lets everybody agree upon a common industry standard, and, very often, also participates in the making of the standard. Once agreement is reached around the table, Microsoft runs home to develop a rivalling proprietary standard.
Examples
- Text: Microsoft WordXML vs. OpenDocXML
- Sound: WAFF vs. AIFF
- Image: Bitmap Picture (BMP) vs. Tag Image File Format (TIFF)
- Web scripting: VBScript vs. JavaScript, a part of W3C standard for DOM.
- Movie: AIM vs. MOV
- Printing: Microsoft XML Paper Specification (XPS) vs. Portable Document Format (PDF)
The funny thing about it is that Microsoft rarely develops better formats than the existing ones. An AIM movie takes up at least 10 times more space than a Quicktime Movie with the same image and sound quality. AIFF covers a wider range of the sound spectrum than the clumsy WAFF. Microsoft Bitmap consumes loads of space but with few of the advantages that TIFF offers with the same hard disk consumption. WordXML is cryptic, in parts secret because of the schemas applied in the document, so user power over the document suffers.
Proprietary, as they were in the beginning, the other browser developers adapted their browsers to include the Microsoft inventions, and Microsoft included Netscape proprietary tags to the extent possible. The situation was getting out of control because the content providers as well as the web masters needed to adapt their pages to the different browsers. Once it was founded in 1994, the first task of the World Wide Web Consortium (below abbreviated W3C) became, thus, to get consensus about the HTML code. Instead of forcing the content providers (schools, government bodies, non-governmental organisations, editing houses, non-internet industries, etc.) to adapt their pages the browsers, the browser providers (Netscape, Microsoft, iCab, AOL, Opera, etc.) had to adapt their products to the standard. It wasn't consumer influence in the proper sense of the word, but it sure was in the interest of the consumers.
HTML 4 and CSS level 1
The further development of the standards was a pleasure to
content contributors like me. In 1999, the HTML 4.0
specification was issued, using the term Recommendation, forseeing a
separation of structure (HTML) and appearance (CSS, a standard
introduced by W3C
members Håkon Wium Lie and Bert Bos in a recommendation of 17
December 1996). The fathers of HTML, as
early as the first versions of HTML,
popularly named HTML
or HTML
1, were dealing with style sheets known from the typographical business
to give the pages a uniform look across the individual web site, but it
wasn't implemented until the emergence of HTML 4.0, and
priciples to get rid of all appearance specifying tags and attributes,
such as <center>, <font>, <...
color="foo" ...>, <... bkground="foo" ...>, <...
align="foo" ...>, <... alink="foo" ...>
were not
clear until the W3C
Recommendation 24 December 1999 of HTML 4.01.
HTML
4.01 anticipated a merger of the coding language of the World Wide Web
with another SGML variety, XML.
The XML
specification is much narrower than the SGML specification when it
comes to the syntax, but XML
is as open as SGML in terms of the number of tags and attributes.
Anyone can make his own XML,
and many people do so.
HTML 4.01 had a strict version, called HTML 4.01 Strict,
which already was quite close to XHTML,
except that it still allowed for open tags like <img>,
<meta>, <link>, <col>
. Another
version, HTML 4.01 Transitional, allowed for the old appearance tags and attributes
known from the previous versions 3.2 and earlier, but within the new
grammar, such as consistent use of apostrophes and quotation marks. The
last of the three HTML
4.01 versions specified a use of frames in the build-up of the web page
but was otherwise using the same recommendation as HTML 4.01
Strict.
XHTML 1
The XHTML 1.0 recommendation was introduced on 26 January 2000 with the subtitle A Reformulation of HTML 4 in XML 1.0. The differences from HTML 4.01 were minimal. The transition to XHTML 1.0 was not difficult, as XHTML came in the same varieties as its predecessor with HTML 4.01 Strict, HTML 4.01 Transitional, and HTML 4.01 Frameset. Probably because of this fact, many web developers took no heed of the novelty in web specifications. When I took a course at the Danish School of Journalism in autumn 2003, the course instructor had heard of XHTML but couldn't tell its differences from HTML, and only used it as a killer argument in discussions with the students. He and his technical assistant subsequently only taught web content providing in HTML, recommending lay-out tricks based on HTML 3.2, such as lay-out with tables — one of the biggest design mistakes and HTML abuses ever made on the web. (XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition).
XHTML 1.1 — a minor improvement
The later XHTML™ 1.1 - Module-based XHTML - Second Edition still has a status as a working draft, but, as is the fact in web development, has already gained some acceptance, also on this web site. Its first version was ready in 2001. The main differences from XHTML 1.0 are
- the stricter use of identifying web tags, using only the
id="foo"
attribute; - allowing only for one language marker of the natural language content, the XML:lang="foo" attribute; and
- introducing the
<ruby>
tag to write and transliterate hieroglyphical texts like Chinese, Japanese, and Ancient Egyptian.
XHMTL had little impact in the web community
I made my own survey in the implementation of XHTML at 33 home pages in 2005, five years after the introduction of XHTML. Only one third had taken XHTML serious, and most of the XHTML positive results in my sample were from sites I had made myself. When cleaned from subjectivity, the result of this not-no-scientific survey shows that 19% of the home pages were in XHTML. It is understandable that it can be hard to change all pages of a web site hosting hundreds or thousands of individual documents, but could it be that hard to implement the home page — the very front page of the company's appearance on the World Wide Web?
Another thing was that not all browsers could render the many features of XHTML or, for that sake, HTML, and the CSS languages. In fact, there is no such browser as of this day — April 2008 — that give the full user experience anticipated by HTML 4+ and CSS 2+. Browsers coming close are Gecko based browsers (Firefox, Camino, SeaMonkey, Flock, and the now deceased Netscape, Mozilla Suite, Firebird), iCab, Apple Web Core Engine browsers (Safari, Shiira, OmniWeb, Bumper Car, etc.), and — notably — Opera.
A third issue are the web development tools. Manufacturers of high-end web authoring tools like Macromedia and Adobe, who merged in 2006, soon included XHTML 1.0 into the standards their programs Macromedia Dreamweaver and Adobe InDesign, but many others didn't. And the default authoring code language is still HTML 4.01 (at least in my version, Dreamweaver 2004). The Mozilla suite was only doing HTML 4.01, and so does still its successor SeaMonkey. Even worse, corporate web sites and others using Content Management Systems quite frequently have their software developed by programmers who only have a vague idea of XHTML, and, if they know a bit more, don't support the principles in practice.
XHTML 2.0 off the ground
The XHTML
version 2.0 is far from the reach of the rest of the web building
community. It has many bright ideas about how the web pages should be,
but it is pretty hard to implement. It restructures hyperlinks, meta
tagging, and doesn't properly clean up the mix-up of appearance and
structure left as a karma on the W3C. For instance, it abandons the hyperlink tag <a>
and transfers its most important property, the href="foo"
attribute, to any other tag. The argument is understandable — a hyperlink should be a property of an element, not an element by itself with its own tag — but it is hard to incorporate, and it is not backwards compatible.
Example: How to make a hyperlink in an image
XHTML 1.1
<a href="http://www.thau-knudsen.dk/en/" title="Erik Thau-Knudsen's site" hreflang="en">
<img src="http://www.thau-knudsen.dk/picts/erik/erik_thau-knudsen.jpeg" alt="Self-portrait of Erik" />
</a>
XHTML 2.0
<img src="http://www.thau-knudsen.dk/picts/erik/erik_thau-knudsen.jpeg" alt="Self-portrait of Erik" href="http://www.thau-knudsen.dk/en/" title="Erik Thau-Knudsen's site" hreflang="en"/>
HTML 5 — core code as it is and should be
On January 22, 2008, a W3C working group of the most prominent representatives of web development — software manufacturers and individuals — formulated HTML 5. Instead of trying to impose on the rest of the world what the World Wide Web should be, the working group based its ideas on what the web is, on how pages are structured, and thus trying to improve it. It is backwards compatible in the sense that old browsers can reproduce it correctly, but also continuing openness to XML by allowing for a version XHTML called XHTML 5.
It is anticipated to replace HTML 4 and XHTML 1.x. Not all the work of the XHTML 2.0 Work Group was in vain, and their idea about structuring pages into sections with own headings got its own tag, <section>
, from XHTML 2.0.
Leave all appearance to CSS
Like XHTML
1.0-2.0, HTML
5 calls for strict coding, strict syntax, but it goes further by
abandoning all attributes that render style. Tables have no longer
attributes like cellpadding="foo", cellspacing="foo", width="foo", or
the rarer frame="foo". Images no longer have attributes like
height="foo" and width="foo". And web forms no longer need <input>
and <textarea>
attributes like size="foo"
and cols="foo"
. All is this left to be
defined in a stylesheet.
Other attributes being abandoned are the rarely used charset="foo"
and rev="foo"
in <link>
and
<a>
tags. Some tags were abandoned
all-together, like <acronym>
because many content providers confuse it with
<abbr>
which is its replacement.
Novelties are dedicated tags for menus, headers, headlines, footers, and a stricter definition of the address tag, allowing it to be used only to specify the authors of the part of the page above it, usually included in the same parent tag.
Benefits in short
The benefits of HTML 5, or XHTML 5, are so obvious that I can only recommend all web developers to endorse and implement the upcoming standard. The advantages are:
- Appearance and structuring technologies are finally kept apart with a water- and fire-proof iron curtain;
- new tags are introduced to specify elements of more complex web pages;
- simplicity: coding is even cleaner;
- better scripting: DOM 2.0 improvements allows for manipulation of tags and classes;
- re-inforcing the status of the
target="foo"
attribute of the hyperlinks, otherwise abandoned in XHTML 1.0-2.0. - backwards compatible with browsers developed in 2000 and later;
- forwards compatibility with non-desktop media like hand-held devices, browsers with speech, etc.;
- ease of use in existing web authoring tools
Browser manufacturers were prepared
Given the fact that the signal keep appearance apart
from structure
has already resounded from W3C for a
decade, many HTML
and XHTML
parsers have already been prepared for this change. The Apple WebCore
engine has already implemented much of HTML 5 forms,
visible in its winter 2008 update to Safari 3.1. Opera AS, producing the desktop
browser Opera and the Java based mobile phone browser Opera Mini,
announced in February 2008 that it also has embraced HTML 5/XHTML
5.
Many details in the January 22 working paper, if not all, are a
result of a majority vote. Could I have voted, I would also have kept
the <acronym>
tag as well as
introduced a <distance>
tag for irony and non-witnessed narratives, but I am
not a W3C
member. The document is not a recommendation (the W3C euphemism
for a settled specification of a standard), nor a candidate
recommendation. On April 1, 2008, the document had achieved a status as working draft,
which means that it is on its way to gain an official status; next phase is Candidate Recommendation.
As for this site and others under my administration, the transition to XHTML 5 will happen, but only stepwise. I hope to have turned all new pages into XHTML 5 by the end of 2008.
Erik Thau-Knudsen
Further reading
- HTML 5 differences from HTML 4 — also useful if you want your pages to meet XHTML requirements
- Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification
- W3C Technical Reports and Recommendations
2008-02-03