Report of a meeting of the DLF on preservation reformatting practices
30 May 2001
revised 30 July 2001
Present: Sherry Byrne (Chicago), Robin Dale (RLG), Daniel Greenstein (DLF), Anne Kenney (Cornell/CLIR), Carla Montori (Michigan), Ron Murray (Library of Congress), Chris Rutolo (Virginia), Judy Thomas (Virginia), John Price Wilkin (Michigan), Eileen Fenton (JSTOR)
Apologies: Janet Gertz (Columbia), Stephen Chapman and Jan Merrill Oldham (Harvard), Paul Conwa(Yale), Howard Besser (CDL)
The report recommends a minimum benchmark for digitized printed texts, defined here as preservation digital masters. The recommendation, its importance, rationale, and implications are set out under the following heads, alongside an indication of next steps for their review:
- What is a preservation digital master
- Why is it important to build consensus around a preservation digital master
- Rationale behind recommendations pertaining to a preservation digital master
- Implementation issues
- Additional research required
- Next steps
- Appendix. Draft list of structural metadata elements that should be required for preservation digital masters
1. What is a preservation digital master
A preservation digital master is a digital facsimile that is a faithful rendering of a printed text (including texts with illustrations and rare and early printed texts).
A preservation digital master must include digital page images.
The page images of a digital preservation master will have or exceed the following minimum level characteristics
Preservation digital masters must have descriptive, structural and administrative metadata, and the metadata must be made available in well-documented formats. Structural metadata must include page level information e.g. as required for page turning and related application software. A minimum list of structural metadata elements is recommended in the appendix.
Preservation digital masters may include machine-readable text as follows:
corrected OCR that is below 99.995% accurate,
corrected text (keyboarded or OCR) that is at or above 99.995% accurate
As well as:
2. Why is it important to build consensus around preservation digital masters
By agreeing to a minimum level benchmark for a preservation digital master, libraries and other organizations can reduce the risk involved in the production and maintenance of digitized texts while inspiring confidence in and encouraging their use.
Because a preservation digital master will be considered by the community as a digital object that is able to meet anticipated current and future needs, an organization creating the preservation digital master can invest in digitization secure in the knowledge that it will not be forced to re-digitize the object at some future date even as production techniques improve.
Users, meantime, will develop confidence in preservation digital masters because they have a minimum level of well-known and consistent properties, and they will support a wide variety of uses (including uses not possible with printed texts).
As access to printed texts shifts increasingly to digital preservation masters and their derivatives, collection managers may begin to investigate alternative means for responsibly and non-redundantly preserving the printed texts (or artifacts) from which they are produced; for example, establishing a network of specialist print repositories.
In particular, by building consensus around the characteristics of preservation digital masters, libraries and other organizations that produce and support access to printed texts will be able more effectively to:
- Write contracts with vendors who offer digitization service and compare vendors pricing structures - the preservation digital master will be the base level production quality that can be required of vendors and form a baseline for price comparison
- Commit to making preservation digital masters accessible over the longer term - preservation digital masters will be invested with an intrinsic value that makes them worth maintaining
- Level up digitization efforts to a point where digital objects are known to have a certain quality capable of supporting production of various derivatives and thus various uses, users, and user needs
- Instill confidence in users who will know that preservation digital masters support their needs, enable projection and detailed review of anomalies that may exist in the source text, enable print reproduction of quality that is equivalent to or better than that achieved by photocopying directly from the source text.
- Create objects with optimal and well known processability
- Define and narrow preservation options, e.g., as may be required to migrate the preservation digital masters through changing technical regimes
- Motivate investment in digitization as a strategy for managing collections of printed texts
- Supply guidance to funding and other agencies that invest in digitization or otherwise exercise some strategic influence over the networked scholarly information landscape
It is important also to be specific about what consensus about preservation digital masters will not, should not, and is not intended to do.
- It is not intended to promote or to define methods for creating digital replacement copies for the source documents. Rather, we see consensus around benchmark preservation digital masters as an essential step that will allow us as a community to implement the means of responsibly and effectively managing, preserving, and conserving our printed heritage.
- It is not intended as an absolute statement of best practice - one that assumes digitization methods won't improve. Methods will continue to improve and our understanding of best practice will improve with them. That is why preservation digital masters are defined as digital objects with a certain number of minimum level characteristics.
- It is not intended to diminish the importance, encourage poor management, or force re-scanning of legacy collections that were made at lower than recommended levels. The recommendation takes a prospective look. It suggests that from this date forward, preservation digital masters will have or exceed the minimum characteristics that are documented here.
3. Rationale behind recommended benchmarks
3.1. Book illustrations
- In considering benchmarks for book illustrations, the illustrations were considered as parts of books rather than as unique objects
- Book illustrations require different benchmarks than printed text. In particular, they require greater bit depth because
- o o the print processes used with illustrations offer fine granular detail that can be lost when captured bitonally
- color cannot be captured with bitonal images
3.2. Rare and early printed materials
- Rare and early printed materials require different benchmarks than circulating printed texts because
- The digital preservation masters will be used by scholars and others whose research requires detailed information about the printed material as a physical artifact. As such the digital preservation masters will require greater bit depth
- The printing used in their production is often very fine and includes a great deal of variation
- Rare books often contain marginalia or annotations that are best captured tonally.
- Given the nature of rare and early printed materials and the use to which their preservation digital masters may be put, a case can be made for setting a benchmark minimum level resolution at 600dpi. At the same time, it is recognized that 600dpi may have higher costs with little to no appreciable gain in information capture.For these reasons, institutions may prefer to digitize at lower resolutions, e.g,. to 400 dpi. For these reasons, the benchmark includes both 400 and 600 dpi.
4. Implementation issues
Which benchmark levels are selected and applied in a digitization project, indeed, whether that effort actually digitizes at or above the benchmark level will be determined locally with respect to a number of factors:
- extent and nature of illustrations in the source material being digitized
- intended use of the digital preservation master
- scale and cost of digitization effort
How libraries characterize early and rare printed materials will be a local decision based on local collections and collection expertise. The Rare Books and Manuscript Section of ACRL has issued guidelines on the selection of general collection materials for transfer to special collections that provide useful criteria for determining what constitutes an early or rare printed item:
Books may possess intellectual value, artifactual value, or both. Items with artifactual value include finely printed or bound books, those containing plates, valuable maps, or manuscripts, annotations, drawings or other original art work, including tipped-in photographs, or those published prior to a certain date (e.g., before 1800). Other categories on which there is wide, but not always general agreement, include:
a. fine bindings;
b. early publishers' bindings;
c. extra-illustrated volumes;
d. books with significant provenance;
e. books with decorated endpapers;
f. fine printing;
g. printing on vellum or highly unusual paper;
h. volumes or portfolios containing unbound plates;
i. books with valuable maps or plates;
j. books by local authors of particular note;
k. material requiring security (e.g., books in unusual formats, erotica or materials that are difficult to replace)
l. novels with duskjackets containing important information (e.g., test, illustrative design, and prices).
The rarity and importance of individual books are not always self-evident. Some books, for example, were produced in circumstances which virtually guarantee their rarity (e.g., Confederate imprints). Factors affecting importance and rarity can include the following:
- desirability to collectors and the antiquarian book trade;
- intrinsic or extrinsic evidence of censorship or repression;
- seminal nature or importance to a particular field of study or genre of literature;
- restricted or limited publication;
- the cost of acquisition.
5. Recommended additional research
The following additional research needs to be conducted
- Formal cost-benefit comparison of 1, 8, and 24 bit digitized text images rendered at 200, 300, 400, and 600 dpi, respectively. The research will formally compare processability, image quality, and derivative creation capacity of digital objects made at different levels. It would also compare different costs in creating objects at different levels e.g. in terms of additional overhead required for initial conversion, storage, image handling and manipulation, etc.
- Investigation of JPEG 2000 and how it may figure in these recommendations
- Limited survey of what libraries would do differently with respect to collection management and digital library development if a benchmark for preservation digital masters was agreed
- Investigation into quality assessment methods for images that would ensure consistent output measures of quality to complement the imprecise input requirements (e.g., resolution and bit depth). *
6. Next steps
- Circulate draft report to participants for initial review and comment
- Post agreed report to the DLF website and seek review by DLF members with a view to recommending DLF's organizational endorsement
- Ask the DLF image quality group (convened by Stephen Chapman) to recommend concrete steps for implementing additional research, particularly with respect to JPEG 2000 and to image assessment methods.
- Launch a more public review of recommendations in September at NEH meeting, via ARL, and through various digital preservation lists. This more public review can be initiated without the recommendations having an endorsement from the DLF
Appendix. Draft list of structural metadata elements that should be required for preservation digital masters
M = Mandatory; MA = Mandatory if applicable; O = Optional
Relationship to Other Resources (MA)
Metadata Locations (M)
Start image/page (M)
End image/page (M)
Title page (M)
Copyright page (M)
Table of contents (M)
List of illustrations (O)
List of tables (O)
Beginning segments (e.g., forward, preface, acknowledgements) (O)
End segments (e.g., epilogue, afterword, conclusion, etc.) (O)
Page numbers (M)
Blank page (M)
Table of contents (M)
Index (at issue and volume level) (M)
Corrections and retractions (O)
Serial front matter (M)
Serial part (O)
Serial section (O)
Name index (if separate from other index) (O)
Subject index (if separate from other index) (O)
Page numbers (M)
Blank page (M)
Article title (O)
Page numbers (M)
Blank page (M)
CLIR Home Page