HTML 3.0 28th March 1995 INTERNET DRAFT Dave Raggett, W3C Expires in six months email: HyperText Markup Language Specification Version 3.0 Status of this Memo This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working information as Internet drafts. Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress". To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.cnri.reston.va.us/ Distribution of this document is unlimited. Please send comments to the HTML working group (HTML-WG) of the Internet Engineering Task Force (IETF) at . Discussions of this group are archived at URL: http://www.acl.lanl.gov/HTML-WG/archives.html. Abstract The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with inlined graphics and hypertext views of existing bodies of information. This specification defines the capabilities of HTML version 3.0 and provides additional capabilities over previous versions such as tables, text flow around figures and math. It is backwards compatible with HTML 2.0. [Link to Table of Contents]-- Dave Raggett Page 1 HTML 3.0 28th March 1995 Table of Contents 1. Introduction ...................................................... 4 a) How to participate in refining HTML 3.0 ....................... 4 b) HTML 3.0 Overview ............................................. 4 c) Transition strategy from HTML 2.0 ............................. 5 d) Design Guidelines for HTML 3.0 ................................ 6 2. Understanding HTML and MIME ....................................... 9 3. Understanding HTML and SGML ...................................... 10 4. The Structure of HTML 3.0 Documents .............................. 15 5. The HEAD Element and Related Elements ............................ 17 6. The BODY Elements ................................................ 24 a) Banners ...................................................... 26 b) Divisions .................................................... 27 c) Heading Elements ............................................. 29 d) Paragraphs ................................................... 33 e) Line Breaks .................................................. 36 f) Horizontal Tabs .............................................. 38 g) Hypertext Links .............................................. 40 h) Overview of Character-Level Elements ......................... 44 - Information Type Elements ................................ 46 - Font Style Elements ...................................... 48 i) The IMG (Image) Element ...................................... 50 j) Unordered Lists .............................................. 53 k) Ordered Lists ................................................ 59 l) Definition Lists ............................................. 62 m) Figures ...................................................... 69 n) Tables ....................................................... 77 Dave Raggett Page 2 HTML 3.0 28th March 1995 o) Math -- missing entity names -- .............................. 92 p) Horizontal Rules ............................................ 111 q) Preformatted Text ........................................... 113 r) Admonishments ............................................... 116 s) Footnotes ................................................... 118 t) Block Quotes ................................................ 120 u) The ADDRESS Element ......................................... 122 v) Fill-out Forms .............................................. 124 7. Special Characters .............................................. 142 8. Security Considerations ......................................... 145 9. HTML 3.0 Document Type Definition a) The SGML Declaration ........................................ 146 b) The Latin-1 Character Entities -- needs work ................ 148 c) Math and Greek Entities -- under construction ............... 154 d) HTML Icon Entities .......................................... 156 e) The HTML 3.0 DTD ............................................ 157 10. Terms -- needs work ............................................. 185 11. References -- needs work ........................................ 188 12. Acknowledgements -- needs work .................................. 189 Dave Raggett Page 3 HTML 3.0 28th March 1995 Introduction to HTML 3.0 HyperText Markup Language (HTML) is a simple markup system used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information. HTML has been in use by the World-Wide Web (WWW) global information initiative since 1990. The HTML 3.0 specification provides a number of new features, and is broadly backwards compatible with HTML 2.0. It is defined as an application of International Standard ISO ISO8879:1986 Standard Generalized Markup Language (SGML). This specificiation will be proposed as the Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) called "text/html; version=3.0". How to participate in refining HTML 3.0 The process of refining HTML 3.0 into a formal standard will be carried out by the IETF HTML working group. The World Wide Web Organization is continuing to develop a freeware testbed browser for HTML 3.0 ("Arena") to encourage people to try out the proposed features. The discussion list for HTML 3.0 is www-html with html-wg reserved for use by the IETF working group for detailed matters relating to the formal specification. The process for developing HTML 3.0 is open, and anyone who is interested and able to contribute to this effort is welcome to join in. --Note: make mailing list names into hypertext links to their archives and add info on how to join these lists-- HTML 3.0 Overview HTML 3.0 builds upon HTML 2.0 and provides full backwards compatibility. Tables have been one of the most requested features, with text flow around figures and math as runners up. Traditional SGML table models, e.g. the CALS table model, are really complex. The HTML 3.0 proposal for tables uses a lightweight style of markup suitable for rendering on a very wide range of output devices, including braille and speech synthesizers. HTML 3.0 introduces a new element: FIG for inline figures. This provides for client-side handling of hotzones while cleanly catering for non-graphical browsers. Text can be flowed around figures and you can control when to break the flow to begin a new element. Including support for equations and formulae in HTML 3.0 adds relatively little complexity to a browser. The proposed format is strongly influenced by TeX. Like tables, the format uses a Dave Raggett Page 4 HTML 3.0 28th March 1995 lightweight style of markup - simple enough to type in by hand, although it will in most cases be easier to use a filter from a word processing format or a direct HTML 3.0 wysiwyg editor. The level of support is compatible with most word processing software, and avoids the drawbacks from having to convert math to inline images. The Web has acted as a huge exercise in user testing, and we have been able to glean lots of information from the ways people abuse HTML in trying to get a particular effect; as well as from explicit demand for new features. HTML 3.0, as a result, includes support for customised lists; fine positioning control with entities like &emspace; horizontal tabs and horizontal alignment of headers and paragraph text. Additional features include a static banner area for corporate logos, disclaimers and customized navigation/search controls. The LINK element can be used to provide standard toolbar/menu items for navigation, such as previous and next buttons. The NOTE element is used for admonishments such as notes, cautions or warnings, and also used for footnotes. Forms have been extended to support graphical selection menus with client-side handling of events similar to FIG. Other new form field types include range controls, scribble on image, file upload and audio input fields. Client-side scripting of forms is envisaged with the script attribute of the FORM element. Forms and tables make for a powerful combination offering rich opportunities for laying out custom interfaces to remote information systems. To counter the temptation to add yet more presentation features, HTML 3.0 is designed (but doesn't require) to be used together with style sheets which give rich control over document rendering, and can take into account the user's preferences, the window size and other resource limitations, such as which fonts are actually available. This work will eventually lead to smart layout under the author's control, with rich magazine style layouts for full screen viewing, switching to simpler layouts when the window is shrunk. The SGML Open consortium is promoting use of DSSSL Lite by James Clark. This is a simplified subset of DSSSL - the document style semantics specification language. DSSSL is a ISO standard for representing presentation semantics for SGML documents, but is much too complex in its entirety to be well suited to the World Wide Web. Håkon Lie maintains a list of pointers to work on style sheets. Transition Strategy from HTML 2.0 The use of the MIME content type: "text/html; version=3.0" is recommended to prevent existing HTML 2.0 user agents screwing up by attempting to show 3.0 documents. Tests have shown that the suggested content type will safely cause existing user agents to display the save to file dialog rather than incorrectly displaying the document as if it were HTML 2.0. Dave Raggett Page 5 HTML 3.0 28th March 1995 To make it easy for servers to distinguish 3.0 documents from 2.0 documents, it is suggested that 3.0 files are saved with the extension ".html3" (or ".ht3" for PCs). Servers can also exploit the accept headers in HTTP requests from HTML user agents, to distinguish whether each client can or cannot support HTML 3.0. This makes it practical for information providers to start providing HTML 3.0 versions of existing documents for newer user agents, without impacting older user agents. It is envisaged that programs will be made available for automatic down conversion of 3.0 to 2.0 documents. This conversion could be carried out in batch mode, or on the fly (with caching for greater efficiency). Design Guidelines The HTML 3.0 draft specification has been written to the following guidelines. Lingua Franca for the Web HTML is intended as a common medium for tying together information from widely different sources. A means to rise above the interoperability problems with existing document formats, and a means to provide a truly open interface to proprietary information systems. Simplicity The first version of HTML was designed to be extremely simple, both to author and to write browsers for. This has played a major role in the incredibly rapid growth of the World Wide Web. HTML 3.0 provides a clean superset of HTML 2.0 adding high value features such as tables, text flow around figures and math, while still remaining a simple document format. The pressures to adopt the complexities of traditional SGML applications has been resisted, for example the Department of Defense's CALS table model or the ISO 12083 math DTD. Scaleability As time goes by, people's expectations change, and more will be demanded of HTML. One manifestation of this is the pressure to add yet more tags. HTML 3.0 introduces a means for subclassing elements in an open-ended way. This can be used to distinguish the role of a paragraph element as being a couplet in a stansa, or a mathematical term as being a tensor. This ability to make fresh distinctions can be exploited to impart distinct rendering styles or to support richer search mechanisms, without further complicating the HTML document format itself. Scaleability is also achieved via URI based links for embedding information in other formats. Initially limited to a few image formats, inline support is expected to rapidly evolve to cover drawing formats, video, distributed virtual reality and a general means for embedding other applications. Dave Raggett Page 6 HTML 3.0 28th March 1995 Platform Independence HTML is designed to allow rendering on a very wide range of devices, from clunky teletypes, to terminals, DOS, Windows, Macs and high end Workstations, as well as non-visual media such as speech and braille. In this, it allows users to exploit the legacy of older equipment as well as the latest and best of new machines. HTML 3.0 provides for improved support for non-graphical clients, allowing for rich markup in place of the figures shown on graphical clients. HTML can be rendered on a wide variety of screen sizes, using a scrolling or paged model. The fonts and presentation can be adjusted to suit the resources available in the host machine and the user's preferences. Content --not-- Presentation Markup Information providers are used to tight control over the final appearence of documents. The need for platform independence weighs against this, but there is still a strong pressure to find appropriate means for information providers to express their intentions. The experience with proprietary document formats has shown the dangers of mixing presentation markup with content (or structural) markup. It becomes difficult to apply different presentation styles. It becomes painful to incorporate material from different sources (with different presentation styles). It becomes difficult to be truly platform independent. As a result, HTML 3.0 is designed for use with linked style information that defines the intended presentation style for each element. Style sheets can be expressed in a platform independent fashion or used to provide more detailed control for particular classes of clients or output media. Support for Cascaded Style Sheets For the Web, it is valuable to allow for a cascading of style preferences. The client has certain built-in preferences; the publisher may require a particular house style, e.g. for brand distinction; the author may feel the need to override the house style for special cases; the end-user may feel strongly about certain things, e.g. large fonts for easier visibility or avoiding certain colors due to an inability to distinguish between them. HTML 3.0 supports style sheets via the use of the LINK element to reference a style sheet with a URI. Authors can place overrides in separate style sheets or include them in the document head within the STYLE element. The effectiveness of caching mechanisms for speeding up the retrieval of style sheets is enhanced by the separation of style information into generic commonly used style sheets, and overrides specific to this document. Support for Non-Visual Media HTML 3.0 is designed to cater for the needs of the visually impaired. Markup for inline figures includes support for rich descriptions, along with hypertext links that double up as defining Dave Raggett Page 7 HTML 3.0 28th March 1995 geometric hotzones for graphical browsers, simplifying the author's job in catering for the different groups of users. Table markup includes provision for abbreviated row and column names for each cell, which are essential for conversion to speech or braille. Math markup treats formulae and equations as hierarchies of expressions. This allows disambiguating pauses to be inserted in appropriate places during conversion to speech. Support for different ways of creating HTML HTML 3.0 has been designed to be created in a variety of different ways. It is deliberately simple enough to type in by hand. It can be authored using wysiwyg editors for HTML, or it can be generated via export filters from common word processing formats, or other SGML applications. Dave Raggett Page 8 HTML 3.0 28th March 1995 Understanding HTML and MIME --I have dropped the differentiation of HTML into a sequence of conformance levels. Many people confused levels with versions. The different levels also encourage interoperability problems! Lets encourage full conformance with HTML 2.0 or HTML 3.0 rather than perpetuating intermediate levels of support.-- HTML as an Internet Media Type This (and upward compatible specifications) define the Internet Media Type (RFC 1590) and MIME Content Type (RFC 1521) called "text/html". The type "text/html" accepts the following parameters: Version To help avoid future compatibility problems, the version parameter may be used to give the version number of the specification to which the document conforms. The version number appears at the front of this document and within the public identifier for the SGML DTD. This specification defines version 3.0. Character sets The charset parameter (as defined in section 7.1.1 of RFC 1521) may be used with the text/html content type to specify the encoding used to represent the HTML document as a sequence of bytes. Normally, text/* media types specify a default of US-ASCII for the charset parameter. However, for text/html, if the byte stream contains data that is not in the 7-bit US-ASCII set, the HTML interpreting agent should assume a default charset of ISO-8859-1. When an HTML document is encoded using US-ASCII, the mechanisms of numeric character references and character entity references may be used to encode additional characters from ISO-8859-1. Character entity references are needed for symbols such as math and greek characters from other unspecified character sets. Other values for the charset parameter are not defined in this specification, but may be specified in future versions of HTML. It is envisioned that HTML will use the charset parameter to allow support for non-Latin characters such as Arabic, Hebrew, Cyrillic and Japanese, rather than relying on any SGML mechanism for doing so. --What about Unicode and its assorted encodings? This section would benefit from an explanation of the issues underlying support for multiple character sets and the problems arising from bidirectionality.-- Dave Raggett Page 9 HTML 3.0 28th March 1995 Understanding HTML and SGML HTML is an application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language (SGML). SGML is a system for defining structured document types, and markup languages to represent instances of those document types. The SGML declaration for HTML is given in SGML Declaration for HTML. It is implicit among WWW implementations. In the event of any apparent conflict between HTML and SGML standards, the SGML standard is definitive. Every SGML document has three parts: SGML declaration Binds SGML processing quantities and syntax token names to specific values. For example, the SGML declaration in the HTML DTD specifies that the string that opens an end tag is , and end tags are delimited by . For example:

This is a Heading

This is a paragraph. Some elements appear as just a start tag. For example, to create a line break, you use
. Additionally, the end tags of some other elements (e.g. P, LI, DT, DD) can be omitted as the position of the end tag is clearly implied by the context. The content of an element is a sequence of characters and nested elements. Some elements, such as anchors, cannot be nested. Anchors Dave Raggett Page 10 HTML 3.0 28th March 1995 and character highlighting may be put inside other constructs. The content model for a tag defines the syntax permitted for the content. Note: The SGML declaration for HTML specifies SHORTTAG YES, which means that there are other valid syntaxes for tags, such as NET tags, ; and empty end tags, . Until support for these idioms is widely deployed, their use is strongly discouraged. ------------------------------------------------------------------------------ Names The element name immediately follows the tag open delimiter. An element name consist of a letter followed by up to 72 letters, digits, periods, or hyphens. Names are not case sensitive. For example, H1 is equivalent to h1. This limit of 72 characters is set by the NAMELEN parameter in the SGML declaration for HTML 3.0. ------------------------------------------------------------------------------ Attributes In a start tag, white space and attributes are allowed between the element name and the closing delimiter. An attribute typically consists of an attribute name, an equal sign, and a value (although some attributes may be just a value). White space is allowed around the equal sign. The value of the attribute may be either: 1. A string literal, delimited by single quotes or double quotes 2. A name token (a sequence of letters, digits, periods, or hyphens) In this example, a is the element name, href is the attribute name, and http://host/dir/file.html is the attribute value: Some implementations consider any occurrence of the > character to signal the end of a tag. For compatibility with such implementations, when > appears in an attribute value, you may want to represent it with an entity or numeric character reference, such as: a > b To put quotes inside of quotes, you can use single quotes if the outer quotes are double or vice versa, as in: First 'real' example Dave Raggett Page 11 HTML 3.0 28th March 1995 Alternatively, you use the character representation " as in: First "real" example The length of an attribute value (after replacing entity and numeric character references) is limited to 1024 characters. This number is defined by the LITLEN parameter in the SGML declaration for HTML 3.0. Note: Some implementations allow any character except space or > in a name token. Attributes values must be quoted only if they don't satisfy the syntax for a name token. Attributes with a declared value of NAME (e.g. ISMAP, COMPACT) may be written using a minimized syntax. The markup: