| Table of Contents | Introduction | The New York Public Library's Digital Historical Projects | Planning Digital Projects for Historical Collections | Conclusion |

Planning Digital Projects for Historical Collections

I. What does a digital project involve?

1. Creating computer facsimiles or images of materials in historical collections
Essentially, a digital project converts printed, manuscript, and pictorial information into electronic images for use in computer-based applications. This is done in a number of ways and with varying levels of precision. The most basic device used is a scanner, which creates an electronic image of a document or picture in much the same way as a photocopy machine does. But instead of being printed on paper, the copy image is viewable on a computer monitor. There are also digital cameras that record images as computer records rather than on photographic film. Cameras can create higher-resolution images than scanners, which is often desired for pictorial material. In addition to these reproductive techniques, a document can be rekeyed (retyped) to create a computer text file.

2. Organizing materials and providing finding aids
Once a picture, document, or collection of materials is digitized, it is described by a finding aid or metadata. Like a conventional library catalog entry, this metadata provides both administrative control functions for the library and finding aids for the user. And like a bibliography, the metadata can both provide the user with a listing of materials relevant to a search, as well as retrieve digital images from that listing. One enormous advantage of rekeying and encoding the texts of historic documents and manuscripts in the computer is that keyword searches can then link researchers to actual content within them rather than just refer them to a title, subject heading, or finding aid.

3. Designing a presentation
Once a historical collection is digitized and cataloged, the information needs to be presented in a coherent, meaningful fashion so the user can navigate the collection and retrieve information efficiently and effectively. The design of most search and retrieval software is based on that of conventional finding aids, such as indexes and catalogs. In addition, digitizing collection descriptions and actual text creates a database providing far greater analytical power than existing analog and digital catalogs. Furthermore, the graphic capabilities of current computer software support the creation of navigational tools, exhibits, and interactive educational modules.

4. Delivering a product
Once a digital project is completed, it can be delivered to users in a number of ways. It may be made available to users on a terminal at your library or published on the World Wide Web. Your digital product could provide an extensive catalog of a collection and include many images and encoded texts or it could present a selective exhibit of the riches within a collection to entice researchers to your library for more services. You can also publish a catalog or educational program on a computer disk or CD-ROM and circulate or market it accordingly.

II. Why undertake a digital project?

1. To increase and broaden access to collections by identifiable constituencies
Your historical collections are likely the most underutilized area in your library. This is largely because these rare and fragile materials need to be locked up in (ideally) a climate-controlled location; require intensive staff effort to produce finding aids; and demand close supervision to ensure careful handling. Also, in the case of smaller libraries, the limited demand for historical collections leads to restrictions in the hours they are made available. Digitizing changes the traditional practice and pattern of access to historical collections. Once in a computer environment, materials can be made accessible on an unrestricted basis, 24 hours a day, if they are published on a website. With the addition of electronic cataloging and searchable text functions, finding materials will be much easier and more of the desired information will be produced. Making your collections available with fewer physical and conceptual barriers to access should lead to greater use. Delivering materials directly to users will enhance their experience. And the incorporation of engaging navigational tools and interpretive guides into your delivery system should entice new and specialized users, particularly schoolchildren and their teachers, into the library.

2. To improve control over collections and the information contained in them
With their benefits of increased use, easier access, and enhanced protection of valuable documents, digital projects can provide incentives to improve existing levels of management in historical collections. The success of a digital project is directly related to the detail and accuracy with which materials are described, classified, and cataloged. Thus, a digital project requires a significant investment in the review, assessment, and organization of collections. However, once this process is completed, your historical collections will be under more effective control in terms of both management and information. Accession records will be complete, condition assessments up to date, and retrieval systems more efficient. Collection records and cataloging will be completed, expanded, and computerized. Generally, the time and funds expended on a digital project that are not offset by a savings in operation costs can be justified by the expansion of the public accessibility of library resources and the resulting improvement in service. Once access is easier, you will find that in the course of their work, researchers and scholars will help catalog, transcribe, and interpret your materials, thereby expanding the scope of your project.

3. To provide an alternative format for fragile materials
One of the biggest concerns governing user access to historical collections is the threat that exposure and handling pose to rare and fragile materials. Unarguably, providing a surrogate is a solution, and a digital copy provides the user with a visually accurate facsimile, which is a more desirable alternative than a typescript. While technology, standards, and practices are still evolving, it is anticipated that digital images will play a significant role in the long-term preservation of historical collections. But with all the unknowns that currently exist, a digital project is better justified for reasons of improving access than it is for preserving collections or their information. Libraries undertaking digital projects as preservation actions must be prepared to update electronic records and software until a long-term solution emerges.

4. To provide a stimulating, innovative environment for users
The novelty of and fascination with interactive computer tools and the World Wide Web contribute to the acceptance and use of digital projects. The world is rapidly becoming oriented around "computer-assisted" information systems, and the literate and education-based users of library information are leading this trend. Alternatively, libraries are excellent providers of information, and these electronic communication systems are starved for data, especially data that are well organized and clearly presented. And the heavy visual components of many historical collections lend themselves to this medium. But perhaps what should make the digitizing of historical collections a priority is that computer and web applications are so attractive to young people, and this technology offers an opportunity to engage children in the fascinations of these collections and to begin to reinterpret history in more relevant contexts.To link local information to global systemsLibrary catalogs are already linked to a number of global information systems through a series of local, regional, and worldwide networks. Updating catalog data about historical collections and bringing them online and into global networks will contribute to a broader awareness and greater appreciation of these valuable materials and the information about local and state history they contain.

III. How to plan for digital projects

1. Survey and evaluate the intellectual or interpretive value of collections

Before selecting an archive or assembling materials for a digital project, you should conduct a broader survey of your library's historical collections. Take a complete and concentrated look at all your collections and determine what kinds of information are represented in each of them. Also, evaluate their intellectual content in terms of their usefulness to researchers and their applicability to electronic applications. These descriptions and prioritizations will help support selections for digital projects. A number of factors should be considered, such as the range of subjects contained in each collection, the types of materials represented, the research topics addressed, the historical period covered, the geographical area involved, and the people associated. Of course, those collections that are already renowned for their informational or historical value and in demand by users will find their way to the top of the list (and you should use them as models for the assessment of other collections). However, careful consideration should be given to seemingly less valuable and more fragmentary collections to make sure that they are represented in this framework. This is important to the integrity of the selection process and prepares for any unexpected opportunity to plan a project including a lower-ranked collection.Finally, to prevent duplication of effort, make sure to find out if other institutions have already digitized, or are planning to digitize, the same or similar materials. This knowledge may affect the priorities and design of your project, provide a model to follow, or lead to a collaborative effort. Queries to relevant web search engines, electronic mailing lists, and professional associations can assist in this procedure.

2. Quantify the size of collections

It is important to establish a record of the volume of materials in your collections. This inventory will allow you to estimate the size of archival groupings and accumulated sets of material types, as well as tally the components of special groupings. Some of the material types you are likely to consider for digital projects are drawings, photographs, prints, maps, manuscripts, and printed texts (books, journals, broadsides, and ephemera). The size of collections and their components plays a crucial role in planning digital projects because the fundamental integrity of any cataloging project relies on the confidence that the material included represents a complete set of information at some definable level (which ranges somewhere between all that you have to all that there is).

Since there will always be funding and time constraints on digital projects, the scope of a project will always be planned around how much digitization the budget allows for.The physical size of materials is also an important piece of information for planning digital projects because of the variety of sizes that different scanning tools will accommodate. The size of the original determines the quality of the digital image, particularly if there is a need for intermediate reductive copies, which reduce the sharpness and fidelity of the original. If nothing else, a distinction needs to be made between materials conforming to standard scanner or camera formats and those that are oversized and will require special treatment. Of course, the more complex the process and the more sophisticated the needed technology, the more costly the digital product.

3. Assess the suitability of materials for digitization

Some historical material lends itself to digitization and online access, and other material does not. Since not all material will be digitized (at least, not all at once), some initial choices need to be made that will prioritize some material types, subject areas, and collections over others. You will inevitably start planning digital projects for your best and most desirable collections or materials, which are already well known to you. But you must consider the remainder of your collections in some critical way and determine the nature and extent of their content and their usefulness to researchers. (This is why it is important to have made a survey of your historical collections prior to planning a digital project.)Another factor in this assessment involves determining the extent to which an object or its information value will be represented through digitization. Some practical advantage needs to be identified. It may be to provide access to fragile documents that cannot be viewed any other way (as long as they are readable both in reality and in facsimile). It may be to allow for searching important texts in a way not available before. It may be to assemble texts or views from a number of libraries in a single source. It may be to create a digital archive that will contribute images and data for other research and educational applications.

Conversely, digitization could result in a collection of images without direct associations to user needs, research goals, educational applications, preservation plans, or organizational priorities. Such materials and/or collections should be identified, but ranked low for project planning until more reasons emerge.In considering a collection's suitability for digitization, decisions need to be made as to the effort required to create a valid facsimile of the material types it contains. In many cases, simply reproducing the historical material as a digital image will be sufficient. But historical materials often require some form of transcription to improve readability or make the text searchable. Currently, there are two options for digitizing text: having the computer convert the text through a process called Optical Character Recognition (OCR) or entering the text manually by retyping it (i.e., "key entry"). The document usually dictates the method you choose. All handwritten materials and many printed texts published prior to the twentieth century cannot be accurately reproduced with current OCR technology and will have to be rekeyed. Obviously, this is very time-consuming, and the library must decide just how accessible documents need to be.

The content of a historical document may not rely on the text alone, but may include illustrations and other graphic images, at times on the same page. Each material type presents its own special conditions for digitization. How is the information on the verso of a photograph retained? Is the original or a surrogate (such as microfilm or slides) more appropriate for the capture? How faithful does color representation or detail need to be? Also, bound items will be very difficult to scan if they do not open at least 90 degrees or have texts or images too close to the binding. Images of large maps and plans will need to be significantly reduced in size to be viewable on computer screens and thus will lose much of their spatial context. Existing microfilm may not have sufficient clarity or resolution for digital reproduction to adequately capture its data. For photographic collections, existing original negatives or transparencies are usually preferable to prints since they usually hold more information. (Exceptions include fine art photographs where the vintage print represents the photographer's interpretation of the image.) These and other factors will have roles to play in determining the suitability of a collection for a digital project.

4. Consider the physical condition of the materials
So many historical materials are old and fragile that the physical condition of materials is a very important consideration when planning a digital project. You need to evaluate how much damage will be caused by the handling required to digitize these materials, and whether it is worth the risk. However, digitization should not be considered an alternative to responsible collection care; that is, a digital image is not an appropriate replacement for a deteriorating original. If a valuable object or collection needs conservation, that work can be incorporated into a digital project and serve other needs of your historical collections.

IV. How to select collections and materials for a digital project

1. Selecting a project
A digital project can be approached in two ways. The ideal project occurs when an entire collection can be selected and funds are available and staff can be allocated to undertake it. This is why having a good understanding of your library's historical collections is important. Even if support for a project requires that it must be phased in over an extended period of time, it is still advantageous to work with complete collections. It is sometimes beneficial to consider collaborations with other libraries to make the digital collection more comprehensive in range and detail of content and to provide users with more complete access. More than likely, however, digital projects will be planned to conform to a more limited budget. But this reality should not discourage you from initiating a project; slow progress is better than none, particularly with such neglected material. And as long as the resulting collection is comprehensive at some modified level, the information system will be useful to researchers and can be expected to expand with data digitized in future projects.

2. Selecting materials: developing selection criteria
It is imperative that a digital project be planned around selection criteria that thoroughly characterize the historical materials and their information value. Good selection criteria address all the predictable dimensions of the materials and the applications in which they will be put to use.Digital projects planned for archival groupings or accumulated sets of materials, genres, or categories should provide a comprehensive record of the collection at one level or another. Otherwise, users cannot rely on the accuracy of responses to their queries. A collection may be all of a certain benefactor's gift; all of a certain material type; everything relevant to a subject, individual, geographical area, or time period; or a combination of any or all of these. The data set can be as large or as small as a comprehensive definition will allow, and it can range from a part of a small collection in a single library to a combination of collections from many libraries. Usually, the scale of the digital project will depend on the funding or staff allocated to it. Therefore, it is important to review your collections and outline a digitizing strategy before you plan a particular digital project.Selection criteria for digital projects planned around a compilation of materials from different historical collections (and, perhaps, from different libraries) are more complicated. Ensuring a comprehensive record within the scope of the project theme is as fundamental in this case as with archival groupings and accumulated sets, but other factors will apply. For example, in The New York Public Library's digital compilation project "Travels Along the Hudson," five additional sets of criteria were created to direct selections.
Selections based on theme:
Materials associated with the themes of transportation, commerce, and travel on the Hudson River.
Selections based on regional geography:
Materials representing the geographic extent of the Hudson from its source in Lake Tear of the Clouds in the Adirondack Mountains to its confluence with the Atlantic.
Selections based on historic period:
Materials produced during the nineteenth century, but more particularly from the beginning of national transportation and commercial development along the Hudson (ca. 1785) to its commemoration with the Hudson-Fulton Celebration (1909).
Selections based on subject headings:
"Core" subject headings associated with broad patterns of American history served as guides for making selections. There are a number of standardized listings to follow, and they are crucial to linking information more globally, both within the project and with other systems. These headings include subjects such as Architecture, Art, Commerce, Community History, Engineering, Industry, Landscape Architecture, Literature, Maritime History, Military History, Social History, Transportation, and Travel/Recreation.
Selections based on material types:
Materials representing the broad spectrum of media associated with the themes, such as drawings, manuscripts, maps, photographs, print views, printed music, printed texts (books, journals, broadsides, and ephemera), and three-dimensional objects.

V. How to organize information
Digital projects are all about organizing information in systematic and hierarchical ways. If nothing else, digital projects with historical materials are worthwhile for what they contribute to upgrading the low level of cataloging that currently exists.

While the organization of digitized material follows standard database formats, it also operates in an information system customized for the particular intellectual content it contains and the unique uses to which it will be applied. Most standard catalog data function in generalized or global reference systems and need to be augmented with data sets that represent the more specific subject and geographical orientation of the historical materials. In the case of the "Travels" project, historical materials were cataloged more intensively. Hudson River views used as illustrations in books and periodicals were cataloged separately because of their importance outside of the context of the publication. New sets of core data were created to focus the materials on the themes of transportation and travel, the nineteenth-century time frame, and Hudson River locales. In addition, wider latitude was used in recording personal names, place names, and subjects in already existing catalog fields. In the end, it was determined that the more information that was entered into a catalog, including unclassified data recorded in memo fields, the better the opportunities were going to be for locating information about regional and local history. Some concessions were also made. The best level of cataloging that could be expected for manuscript material was at the collection level. Rather than defer digitizing the record of this material until item-level cataloging was possible, an inventory containing existing descriptions and finding aids was created.

Once pictorial and text materials are digitized, these graphic displays are linked to the catalog to associate each image to the larger collection of materials and to link users with additional thematic and contextual data. Because the historical record begins but does not end with your project, both the catalog and the image set invite links and additions from other sources in future stages. The selection criteria can be applied on a case-by-case basis as new collections are surveyed and cataloged. Ideally, the bibliography can support unlimited source material. Any additions to the digital archive will be integrated effectively through the classification system established by the selection criteria.

VI. How to deliver materials effectively
Goals for the delivery of materials, whether on the web or at computer workstations on-site, should be established early in the process. Digitized projects bring together specialized collections and research tools for a diverse audience, and they should be responsive to the needs of individuals, communities, and institutions. Some critical goals for web-based presentation include presenting heterogeneous resources in a coherent way, bringing together diverse collections, and integrating access to digital and print resources. As digitized collections mature, goals for web presentation should create an online research environment that will support both novice and expert users, enable users to tailor collections to their own needs and interests, and support collaboration with colleagues.

The functionality of the final product, like keyword searching, field searching, browsing, selecting, and annotating, should support these overall goals and be reflected in the project's digitization and the bibliographic components. For instance, texts must be encoded in SGML to be searchable, individual images must be accompanied by some descriptive data if they are to be easily identified, and all materials must be related to broad subject categories if the collection is to be "browsable." Preserving and presenting the true context of the materials are critical in delivering digitized materials on the web. This needs to be a defining principle of the project and be addressed in each phase of the project.

A key goal for "Travels" was to present collections on the web that are distributed throughout the participating institutions and to provide access to content that is diverse in original format and in the level of detail of descriptive information. In the face of great diversity of content and description, a coherent approach to displaying collections, navigating the site, and indexing and presenting search results was planned as materials were selected. The interface for "Travels" also drew together different technical architectures. Images and texts are on different servers; the images are delivered via a database application while the texts use an SGML-to-HTML application.

Planning a digitization project involves bringing together print and digital items in a coherent way. A user looking for an item in the library catalog should be able to identify it without regard to whether it is available in its original physical form or as a digital or microfilm reproduction. Early planning should address the presentation of intellectual descriptions of originals and reproductions in a fully integrated way through the website, the library catalog, and archival finding aids. Support for printing, requesting permissions, and ordering copies online will also afford more flexible use of digitized collections.

Next> Conclusion

| Table of Contents | Introduction | The New York Public Library's Digital Historical Projects |
| Planning Digital Projects for Historical Collections | Conclusion |
Digital Library Home |

S. Ruddy 08/12/99