Editorial Methods for Creation of the Digital Schomburg Editions

A Note on the Texts:

By Thomas P. Lukas

The Digital Schomburg Editions: An Interlinked Collection

The Digital Schomburg African American Women Writers of the Nineteenth Century series is a contextualized collection of fully searchable electronic editions. In addition to the value of these works as individual electronic texts, their search capabilities and their interlinked arrangement with other collections allow them to be read within a historical context of multiple media. The reader may choose to focus on either a single work or to cross-read or search the whole collection. To complement such reading with a visual inquiry, the reader can also simultaneously access images and data within the Digital Schomburg Images of African Americans From the Nineteenth Century structured database. Each of these options extends the process of reading to a larger context, and thus complicates and refines the potential perspective the reader can gain on the cultural narratives offered by these materials. Such a hyperbook environment also extends access and utility to the rare volumes stored at the Schomburg Center for Research In Black Culture from which the Digital Schomburg Editions were derived. Additionally these editions have been designed with the needs of a variety of readers in mind--not simply those who have an interest in research. While the design of this collection is sophisticated, it was created with a simple, welcoming user interface as first priority.

From Rare Book to Searchable Text: Creation of the Digital Schomburg Edition

The electronic versions have been prepared by a "double-keying" process, in order to assure accurate transcription. Through this process the direct manual entry of textual characters is duplicated, and in many cases triplicated, and then digitally cross-referenced for erroneous variants. Because many of these titles have been written in dialogue, double or triple -keyed entry offers a higher degree of accuracy than OCR (Optical Character Recognition) or scanning and electronic spelling correction: these methods would have rejected important authorial intentions.

After transcription, the electronic texts have been encoded with SGML (Standard Generalized Markup Language) and are presented over the World Wide Web in a cross-searchable collection.

The interlinking of the individual works within the collection presents them within their significant context as a collection because the content of one work informs the reading of another. This arrangement allows readers of various geographical locations and backgrounds to search a single work or the entire collection by keyword or formal attributes.

Textual Relationships:

A Comparison of the Print Source to the Electronic Version

Most readers have seen books that use a blank page to separate chapters. In the unspoken language of traditional print culture, the blank page operates as a transitional cue that narrative events, characters, or scenarios will change in the next chapter, poem, or group of poems.

Readers of physical books rely on this type of visual or physical cue for navigation throughout the book. The electronic text, however, lacking pages, must offer other visual or textual signposts to delineate structural divisions. Therefore in conversion to the digital version, these signposts must be added by the editor.

Not a single character of text has been deleted or excluded from the source documents in their conversion to the Digital Schomburg Edition. For some works, however, minor additions to the text, borrowed from print technology, have been made been made. Although electronic text has no "pages," editors have inserted page numbers that match the page number of the printed work and have added decorative horizontal rules to delineate page breaks. At times, editors have also added section headers not offered in the print source. For example, in a few print sources, sections of the book containing several poems were divided into subsections. In the physical volume, Roman numeral headings delineated the subsections. Problems in direct transcription arose from this arrangement because the overall chapter section carries a title, but the Roman numeral for the first subsection heading is omitted. Readers intuitively grasping structure of the book, understood that the first subsection begins after the title, implied by the subsequent subsections marked in Roman numerals II, III, IV, and so on. When creating the Digital Schomburg Edition, however, editors decided to add the Roman numeral I.

Archiving of Images and Representation of the Physical Book

The New York Public Library has made every possible effort to represent the original print artifacts from which these digital editions have been derived. In addition to the representation of original documentry structure described by SGML markup, images of bindings and title pages, when available, accompany the Digital Schomburg Editions. Images found in these editions were directly captured using a Kontron 3072 ProgRes digital camera. High resolution files are stored on CD-R media in uncompressed archival TIFF format. Web accessible files were resampled, resized, sharpened and saved into JPEG and GIF formats for Internet distribution.

About SGML Encoding and Web Presentation:

Structure, Form, and Content

The New York Public Library Digital Schomburg Editions have been tagged according to a literary usage of TeiLite SGML. SGML tagging allows users to perform electronic searches, and the editors to exert stylistic control over issues such as typography and layout. Users cannot see the SGML tags in the Web-accessible form of these editions; however the electronic files from which these editions are generated at the NYPL server contain items enclosed in angle brackets, such as <title>The Hazely Family</title>. These tags express to the computer the structural divisions of literary documents, such as the title page, chapter, embedded letter, poem, couplet and stanza. Issues of content, such as publishing dates, addressees and signatories of letters, are similarly expressed. These tags allow the computer to differentiate between one character of text and another. Without SGML encoding, these editions would present text alone, rather than an informed interpretation of the empirically discernable hierarchical and content-related characteristics of the physical book.

These texts have been encoded in accordance to the TeiLite DTD (Document Type Definition) of SGML and have been published with Inso DynaWeb electronic publishing software. Through the SGML markup, the electronic version represents documentary form as well as some content. While the editors have made no attempt to create a facsimile volume in the digital editions, we have recorded and made accessible formal structures of the text such as letters and verse, and the names of authors and publishers. Images of original illustrations, frontispieces, bindings and title pages, when available, appear along with the electronic text.

The purposes of such sophisticated document preparation are several and complimentary. Encoding the formal attributes of the physical volume preserves the original object in a descriptive form. Researchers can search the collection for structural units such as verse or letters. Scholars and students can download and manipulate the electronic text with SGML or other markup languages in order to emphasize or expose selected characteristics of the literary work in the form of an electronic critical edition. Perhaps most attractively of all, these out-of-print volumes can be distributed worldwide to buttress their presence as significant literary works on the World Wide Web.

The Dynaweb SGML publishing software the Library has used to mount these texts in World Wide Web-accessible format converts the server-stored SGML electronic text files into an HTML document each time that the user retrieves them. Thus what looks like simple HTML encoded text is accompanied by the richness of content definition and collection-wide contextualization that SGML can offer.

The TEI Header:

An Electronic Birth Certificate

The TEI (Text Encoding Iniative) header, which operates like an "electronic birth certificate" of the text, constitutes an important part of the Digital Schomburg Editons. Identification and creation issues, such as the transcription of the text to electronic format, the print source, and minor changes made by encoding staff and editors, are all recorded in this section for the future life of the text. The header also provides various date and keyword search terms which are helpful to researchers. From this portion of the electronic text our cataloguers create the CATNYP record. The TEI header that we use contains four sections:







1.The File Description <fileDesc> contains a full bibliographical description of the computer file, including the title, author, creator of the electronic version, publisher of electronic version, and size of completed file, in KB. Within this section is information about the printed source.

2.The Encoding Description <encodingDesc> describes the standards under which the text was adapted to electronic publication.

3.The Text Profile Description <profileDesc> conveys non-bibliographic aspects of the text, such as the languages used. Within this section a <keywords> tag earmarks important search terms such as: "fiction" or "non-fiction", or: "drama" ; "prose" ; "poetry" or "verse."

4.The Revision History Description -- <revisionDesc>: allows present and future encoders to provide a history of changes made during the development of the electronic text and subsequent corrections. This field specifically records changes to the text which have been made to accommodate the Digital Schomburg Edition.

