AusWeb 03 Banner

Web site archiving - an approach to recording every materially different response produced by a website

Kent Fitch, Project Computing Pty Ltd. Email: Kent.Fitch@ProjectComputing.com


Keywords

web site archiving, web site harvesting, web server filters, managing electronic records


Abstract

With the critical role played by web servers in corporate communication and the recognition that information published on a web site has the same legal status as its paper equivalent, knowing exactly what has been delivered to viewers of a web site is as much a necessity as keeping file copies of official paper correspondence.

However, whilst traditional records management, change control and versioning systems potentially address the problem of tracking updates to content, in practice, web responses are increasingly being generated dynamically: pages are constructed on the fly from a combination of sources including databases, feeds, script output and static content using dynamically selected templates, stylesheets and output filters and often with per-user "personalisation". Furthermore, the content types being generated are steadily expanding from HTML text and images into audio, video and applications.

Under such circumstances, being able to state with confidence exactly what a site looked like at a given date and exactly what responses have been generated and how and when those responses changed becomes extremely problematic.

This paper discusses an approach to capturing and archiving all materially distinct responses produced by a web site, regardless of their content type and how they are produced. This approach does not remove the need for traditional records management practices but rather augments them by archiving the end results of changes to content and content generation systems. It also discusses the applicability of this approach to the capturing of web sites by harvesters.


[ Full Paper ] [ Presentation ] [ Proceedings ] [ AusWeb Home Page ]



AusWeb 2003. The Ninth Australian World Wide Web Conference, Hyatt Sanctuary Cove, Gold Coast, from 5th to 9th July 2003 Contact: Norsearch Conference Services +61 2 66 20 3932 (from outside Australia) (02) 6620 3932 (from inside Australia) Fax (02) 6622 1954