SGML/HTML Resource Centre
Updated 13/3/96
Introduction
HTML stands for HyperText Markup Language. It's a simple text-based markup language for creating hypertext
documents which can be read by browsers such as Netscape and Mosaic. However, each browser manufacturer falls
prey to the temptation to build in a few 'enhancements'. No names, no pack drill,
Netscape Communications Corporation. This creates serious problems, in that
documents formatted using Netscape enhancements (colour backgrounds, blinking text, centre alignments etc.) are
no longer standard HTML documents. So what, you may ask. Who gives a flying goat's turd as long as it looks good
in Netscape?
Well, quite apart from Netscape's world domination plans, non-standard documents are bad news. A lot of people,
myself included, learn to write HTML by example, from looking at other people's pages. If you get into bad
(i.e. non-standard) habits straight away, you'll probably never learn what is and isn't standard HTML. Also,
Netscape Navigator is renowned for being an easy-going browser in terms of what it will accept as valid HTML.
In fact, you can get away with virtually any old shit. I know I have.
As HTML and its powerful parent language, SGML, become more widespread not only on the Web, but also in
companies' internal information exchange and inter-business communication, standardisation will become increasingly
important. How can you learn about the standards that exist, and how to validate your own documents to make sure
they conform? I've set up this page to provide a few links to information and tools that can help you do this.
But first of all, "Why validate your HTML?"
Learning about SGML
SGML is an extremely powerful meta-language for defining and standardising the structure of documents. It also
describes a grammar which which you can design markup languages (HTML is one such derived markup language).
Any valid HTML document is also valid SGML. The grammar of each language written using SGML is described by a
Document Type Definition (DTD)
file. For example, here is the HTML DTD. It's
pretty frightening initially, but it's intended to be read by machines rather than humans, so don't panic. In
any case, a couple of days with van Herwijnen (see below) and you'll be writing your own DTDs, no worries.
Here are some more places where you can find out about SGML.
Working with SGML
I recommend before you start working with SGML, you start playing with it. Here is what you will need to play with
SGML:
- A text editor
- A validating SGML parser (James Clark's sgmls parser is public domain and available for
DOS and
UNIX).
Here are some very
useful notes on using sgmls to validate HTML documents.
However I strongly
recommend you get hold of sp, its successor. This
has several advantages, the most notable for beginners being the ability to automatically
include an external SGML declaration file, and to use a catalog file to map public
identifiers (for example, the HTML DTD "-//IETF//DTD HTML Level 2//EN") onto physical files
present on the local system (e.g. "C:\SGML\HTML.DTD").
- A bloody good book on SGML. I recommend van Herwijnen, listed above.
- An SGML browser would be nice; there's a free
trial version of Panorama from SoftQuad.
Alternatively, you could use one of the 'friendly' SGML aids available. To be honest though, in common with most
'assistant' software that claims to make markup easy, these do require you to know absolutely what you're doing
otherwise you'll get precisely nowhere with them. Do yourself a favour and learn the 'hard' way first - using
Notepad.
Learning about HTML
HTML is just one application of SGML. It is mostly a layout and typography-based markup language, and thus not really
in the spirit of SGML, which is basically about document structure. However it is extremely useful to know, and
ridiculously easy to learn. Here are some places you can find out more.
- Ian Graham's very good Introduction to HTML.
This has basically everything you need to know to get started, and it's nicely written too.
- HTML Quick Reference Guide. Absolutely invaluable,
not for learning HTML but for reminding you of syntax and usage.
- The HTML Reference Manual at Sandia. Not for
beginners, but contains some useful hints for writing good and standard HTML.
- How do they do that with HTML?. Hints and tips on making really jazzy web pages. Very very good
although you should take the section on Netscape extensions as 'how not to do it' information. Beware because
Netscape-specific material is jumbled in with the clean stuff; you should know the standard before playing about
with this lot.
- Netscape Tricks - I know, I know, it sounds bad, but it's not what you think. Well, mainly. If
you do happen to use Netscape, there are plenty of 'cool' hidden features and all that malarkey.
- HTML background materials at w3.org
- The Ten Commandments of HTML - some useful tips for
making homepages.
- comp.infosystems.www.authoring.html - the Usenet
newsgroup devoted to HTML. High in signal, ranging from philosophical and technical discussions about the future
of HTML, right down to 'how do I centre a table'?
- RFC 1866 - the HTML2.0 standard straight from the horse's mouth.
- March 1995 IETF working draft on HTML 3.0 - now
out of date, but a jolly good read nonetheless, particularly for the NHTML (Netscape HTML) tainted. Remember HTML 3.0 is
not an approved standard yet. When it is, there'll be a lot more to it than those kewl Netscape extensions.
- Where is HTML going? A very thoughtful
discussion of HTML and its standardisation, from a Man Who Knows what he's talking about.
Converting other formats to HTML
Working with HTML
If you're going to write Web stuff I recommend you get hold of several browsers to test it out on. Though I hate to
admit it, for general browsing purposes, Netscape
Navigator 2.0 is probably the best, with NCSA Mosaic
a close second. I wouldn't touch Microsoft Internet Explorer with a
bargepole, whatever that is, but lots of other people do so you'd better get hold of it. If you've got UNIX, try
Arena. If you want a really, really basic browser for testing
purposes, try Cello
(but don't be too hard on it after Netscape). And you'll need to look at your pages with a text-only browser, too.
Try Lynx or
DosLynx.
Anyway, here are some handy resources for the struggling Webster.
Validating HTML documents
Automatic validation services
Several companies provide free HTML checking sites. Here are some I've tried.
Doing it yourself
Whether you validate your documents yourself, or use a public service, you should clearly mark your pages
as standard or non-standard HTML. Here's how.
You are visitor no. to this page.
If you enjoyed it, why not visit the author's home page, Moose Mansions?. If you didn't
enjoy it, write and tell me why not, or alternatively just **** off. Thanks.