SGML/HTML Resource Centre

Updated 13/3/96

Introduction

HTML stands for HyperText Markup Language. It's a simple text-based markup language for creating hypertext documents which can be read by browsers such as Netscape and Mosaic. However, each browser manufacturer falls prey to the temptation to build in a few 'enhancements'. No names, no pack drill, Netscape Communications Corporation. This creates serious problems, in that documents formatted using Netscape enhancements (colour backgrounds, blinking text, centre alignments etc.) are no longer standard HTML documents. So what, you may ask. Who gives a flying goat's turd as long as it looks good in Netscape?

Well, quite apart from Netscape's world domination plans, non-standard documents are bad news. A lot of people, myself included, learn to write HTML by example, from looking at other people's pages. If you get into bad (i.e. non-standard) habits straight away, you'll probably never learn what is and isn't standard HTML. Also, Netscape Navigator is renowned for being an easy-going browser in terms of what it will accept as valid HTML. In fact, you can get away with virtually any old shit. I know I have.

As HTML and its powerful parent language, SGML, become more widespread not only on the Web, but also in companies' internal information exchange and inter-business communication, standardisation will become increasingly important. How can you learn about the standards that exist, and how to validate your own documents to make sure they conform? I've set up this page to provide a few links to information and tools that can help you do this. But first of all, "Why validate your HTML?"

Learning about SGML

SGML is an extremely powerful meta-language for defining and standardising the structure of documents. It also describes a grammar which which you can design markup languages (HTML is one such derived markup language). Any valid HTML document is also valid SGML. The grammar of each language written using SGML is described by a Document Type Definition (DTD) file. For example, here is the HTML DTD. It's pretty frightening initially, but it's intended to be read by machines rather than humans, so don't panic. In any case, a couple of days with van Herwijnen (see below) and you'll be writing your own DTDs, no worries.

Here are some more places where you can find out about SGML.

Working with SGML

I recommend before you start working with SGML, you start playing with it. Here is what you will need to play with SGML: Alternatively, you could use one of the 'friendly' SGML aids available. To be honest though, in common with most 'assistant' software that claims to make markup easy, these do require you to know absolutely what you're doing otherwise you'll get precisely nowhere with them. Do yourself a favour and learn the 'hard' way first - using Notepad.

Learning about HTML

HTML is just one application of SGML. It is mostly a layout and typography-based markup language, and thus not really in the spirit of SGML, which is basically about document structure. However it is extremely useful to know, and ridiculously easy to learn. Here are some places you can find out more.

Converting other formats to HTML

Working with HTML

If you're going to write Web stuff I recommend you get hold of several browsers to test it out on. Though I hate to admit it, for general browsing purposes, Netscape Navigator 2.0 is probably the best, with NCSA Mosaic a close second. I wouldn't touch Microsoft Internet Explorer with a bargepole, whatever that is, but lots of other people do so you'd better get hold of it. If you've got UNIX, try Arena. If you want a really, really basic browser for testing purposes, try Cello (but don't be too hard on it after Netscape). And you'll need to look at your pages with a text-only browser, too. Try Lynx or DosLynx. Anyway, here are some handy resources for the struggling Webster.

Validating HTML documents

Automatic validation services

Several companies provide free HTML checking sites. Here are some I've tried.

Doing it yourself

Whether you validate your documents yourself, or use a public service, you should clearly mark your pages as standard or non-standard HTML. Here's how.
You are visitor no.Counter to this page. If you enjoyed it, why not visit the author's home page, Moose Mansions?. If you didn't enjoy it, write and tell me why not, or alternatively just **** off. Thanks.
*Non-standard HTML document*