old.diglib.org
Report of a meeting of the DLF on preservation reformatting practices
30 May 2001
revised 30 July 2001
D Greenstein
Present: Sherry Byrne (Chicago), Robin Dale (RLG), Daniel Greenstein (DLF), Anne Kenney (Cornell/CLIR), Carla Montori (Michigan), Ron Murray (Library of Congress), Chris Rutolo (Virginia), Judy Thomas (Virginia), John Price Wilkin (Michigan), Eileen Fenton (JSTOR)
Apologies: Janet Gertz (Columbia), Stephen Chapman and Jan Merrill Oldham (Harvard), Paul Conwa(Yale), Howard Besser (CDL)
The report recommends a minimum benchmark for digitized printed texts, defined here as preservation digital masters. The recommendation, its importance, rationale, and implications are set out under the following heads, alongside an indication of next steps for their review:
  1. What is a preservation digital master
  2. Why is it important to build consensus around a preservation digital master
  3. Rationale behind recommendations pertaining to a preservation digital master
  4. Implementation issues
  5. Additional research required
  6. Next steps
  7. Appendix. Draft list of structural metadata elements that should be required for preservation digital masters
1. What is a preservation digital master
A preservation digital master is a digital facsimile that is a faithful rendering of a printed text (including texts with illustrations and rare and early printed texts).
A preservation digital master must include digital page images.
The page images of a digital preservation master will have or exceed the following minimum level characteristics
Printed texts (may include simple line drawings, descreened halftones)Illustrated texts. Black and White.Illustrated texts. Color.Rare and early printed texts
600 dpi, 1-bit TIFF image using ITU- T6 compression (may be dithered up from a 400 optical dpi 1-bit image)400 dpi, 8-bit, TIFF (uncompressed or using lossless compression)400 dpi, 24-bit, TIFF (uncompressed or using lossless compression) for color illustrations400 dpi, 8 or 24 bit TIFF (uncompressed or using lossless compression)
Preservation digital masters must have descriptive, structural and administrative metadata, and the metadata must be made available in well-documented formats. Structural metadata must include page level information e.g. as required for page turning and related application software. A minimum list of structural metadata elements is recommended in the appendix.
Preservation digital masters may include machine-readable text as follows:
Either:
uncorrected OCR,
or
corrected OCR that is below 99.995% accurate,
or
corrected text (keyboarded or OCR) that is at or above 99.995% accurate
As well as:
text that is encoded (at any level, e.g. as specified in TEI Text Encoding in Libraries. Guidelines for Best Encoding Practices. Version 1.0, July 30, 1999)
2. Why is it important to build consensus around preservation digital masters
By agreeing to a minimum level benchmark for a preservation digital master, libraries and other organizations can reduce the risk involved in the production and maintenance of digitized texts while inspiring confidence in and encouraging their use.
Because a preservation digital master will be considered by the community as a digital object that is able to meet anticipated current and future needs, an organization creating the preservation digital master can invest in digitization secure in the knowledge that it will not be forced to re-digitize the object at some future date even as production techniques improve.
Users, meantime, will develop confidence in preservation digital masters because they have a minimum level of well-known and consistent properties, and they will support a wide variety of uses (including uses not possible with printed texts).
As access to printed texts shifts increasingly to digital preservation masters and their derivatives, collection managers may begin to investigate alternative means for responsibly and non-redundantly preserving the printed texts (or artifacts) from which they are produced; for example, establishing a network of specialist print repositories.
In particular, by building consensus around the characteristics of preservation digital masters, libraries and other organizations that produce and support access to printed texts will be able more effectively to:
It is important also to be specific about what consensus about preservation digital masters will not, should not, and is not intended to do.
3. Rationale behind recommended benchmarks
3.1. Book illustrations
3.2. Rare and early printed materials
4. Implementation issues
Which benchmark levels are selected and applied in a digitization project, indeed, whether that effort actually digitizes at or above the benchmark level will be determined locally with respect to a number of factors:
How libraries characterize early and rare printed materials will be a local decision based on local collections and collection expertise. The Rare Books and Manuscript Section of ACRL has issued guidelines on the selection of general collection materials for transfer to special collections that provide useful criteria for determining what constitutes an early or rare printed item:
Books may possess intellectual value, artifactual value, or both. Items with artifactual value include finely printed or bound books, those containing plates, valuable maps, or manuscripts, annotations, drawings or other original art work, including tipped-in photographs, or those published prior to a certain date (e.g., before 1800). Other categories on which there is wide, but not always general agreement, include:

a. fine bindings;
b. early publishers' bindings;
c. extra-illustrated volumes;
d. books with significant provenance;
e. books with decorated endpapers;
f. fine printing;
g. printing on vellum or highly unusual paper;
h. volumes or portfolios containing unbound plates;
i. books with valuable maps or plates;
j. books by local authors of particular note;
k. material requiring security (e.g., books in unusual formats, erotica or materials that are difficult to replace)
l. novels with duskjackets containing important information (e.g., test, illustrative design, and prices).
The rarity and importance of individual books are not always self-evident. Some books, for example, were produced in circumstances which virtually guarantee their rarity (e.g., Confederate imprints). Factors affecting importance and rarity can include the following:
  1. desirability to collectors and the antiquarian book trade;
  2. intrinsic or extrinsic evidence of censorship or repression;
  3. seminal nature or importance to a particular field of study or genre of literature;
  4. restricted or limited publication;
  5. the cost of acquisition.
5. Recommended additional research
The following additional research needs to be conducted
6. Next steps
Appendix. Draft list of structural metadata elements that should be required for preservation digital masters
M = Mandatory; MA = Mandatory if applicable; O = Optional
All Materials
Relationship to Other Resources (MA)
Metadata Locations (M)
Start image/page (M)
End image/page (M)
Monographs
Title page (M)
Copyright page (M)
Table of contents (M)
List of illustrations (O)
List of tables (O)
Beginning segments (e.g., forward, preface, acknowledgements) (O)
End segments (e.g., epilogue, afterword, conclusion, etc.) (O)
Chapters/parts (O)
Notes (O)
Bibliography (O)
Index (M)
Colophon (O)
Errata (O)
Page numbers (M)
Blank page (M)

 
Serials
Entire Publication
Volume (M)
Issue (M)
Supplements (M)
Table of contents (M)
Index (at issue and volume level) (M)
Corrections and retractions (O)
Serial front matter (M)
Serial part (O)
Serial section (O)
Name index (if separate from other index) (O)
Subject index (if separate from other index) (O)
Errata (O)
Page numbers (M)
Blank page (M)
Articles
Article title (O)
Author (O)
Abstract (O)
Date (O)
Tables/figures (O)
Errata (O)
Page numbers (M)
Blank page (M)
 

Please send comments or suggestions.
Last updated:
© 2000 Council on Library and Information Resources
CLIR Home Page
DLF HomeAboutArchitectures, systems and toolsDigital preservationDigital collectionsStandards and practicesUse and usersRoles and responsibilitiesDLF ForumPublications and resources