A Note on the Texts:
Editorial Methods for Creation of the Digital Schomburg Editions
By Thomas P. Lukas
Within This Section:
Schomburg Editions: An Interlinked Collection
The Digital Schomburg African American Women Writers of the Nineteenth Century series
contextualized collection of fully searchable electronic editions. In addition to the value of these
works as individual electronic texts, their search capabilities and their interlinked arrangement
with other collections allow them to be read within a historical context of multiple media. The
reader may choose to focus on either a single work or to cross-read or search the whole
collection. To complement such reading with a visual inquiry, the reader can also simultaneously
access images and data within the Digital Schomburg Images of African Americans From the
Nineteenth Century structured database. Each of these options extends the process of reading to
a larger context, and thus complicates and refines the potential perspective the reader can gain on
the cultural narratives offered by these materials. Such a hyperbook environment also extends
access and utility to the rare volumes stored at the Schomburg Center for Research In Black
Culture from which the Digital Schomburg Editions were derived. Additionally these editions
have been designed with the needs of a variety of readers in mind--not simply those who have an
interest in research. While the design of this collection is sophisticated, it was created with a
simple, welcoming user interface as first priority.
From Rare Book to Searchable Text:
Creation of the Digital Schomburg Edition
The electronic versions have been prepared by a "double-keying" process, in order to assure
accurate transcription. Through this process the direct manual entry of textual characters is
duplicated, and in many cases triplicated, and then digitally cross-referenced for erroneous
variants. Because many of these titles have been written in dialogue, double or triple -keyed
entry offers a higher degree of accuracy than OCR (Optical Character Recognition) or scanning
and electronic spelling correction: these methods would have rejected important authorial
After transcription, the electronic texts have been encoded with SGML (Standard
Markup Language) and are presented over the World Wide Web in a cross-searchable collection.
The interlinking of the individual works within the collection presents them within their
significant context as a collection because the content of one work informs the reading of
another. This arrangement allows readers of various geographical locations and backgrounds to
search a single work or the entire collection by keyword or formal attributes.
A Comparison of the Print Source to the Electronic Version
Most readers have seen books that use a blank page to separate chapters. In the unspoken
language of traditional print culture, the blank page operates as a transitional cue that narrative
events, characters, or scenarios will change in the next chapter, poem, or group of poems.
Readers of physical books rely on this type of visual or physical cue for navigation
the book. The electronic text, however, lacking pages, must offer other visual or textual
signposts to delineate structural divisions. Therefore in conversion to the digital version, these
signposts must be added by the editor.
Not a single character of text has been deleted or excluded from the source documents in
conversion to the Digital Schomburg Edition. For some works, however, minor additions to the
text, borrowed from print technology, have been made been made. Although electronic text has
no "pages," editors have inserted page numbers that match the page number of the printed work
and have added decorative horizontal rules to delineate page breaks. At times, editors have also
added section headers not offered in the print source. For example, in a few print sources,
sections of the book containing several poems were divided into subsections. In the physical
volume, Roman numeral headings delineated the subsections. Problems in direct transcription
arose from this arrangement because the overall chapter section carries a title, but the Roman
numeral for the first subsection heading is omitted. Readers intuitively grasping structure of the
book, understood that the first subsection begins after the title, implied by the subsequent
subsections marked in Roman numerals II, III, IV, and so on. When creating the Digital
Schomburg Edition, however, editors decided to add the Roman numeral I.
Archiving of Images and Representation of the
The New York Public Library has made every possible effort to represent the original print
artifacts from which these digital editions have been derived. In addition to the representation of
original documentry structure described by SGML markup, images of bindings and title pages,
when available, accompany the Digital Schomburg Editions. Images found in these editions
were directly captured using a Kontron 3072 ProgRes digital camera. High resolution files are
stored on CD-R media in uncompressed archival TIFF format. Web accessible files were
resampled, resized, sharpened and saved into JPEG and GIF formats for Internet distribution.
About SGML Encoding and Web
Structure, Form, and Content
The New York Public Library Digital Schomburg Editions have been tagged according to a
literary usage of TeiLite SGML. SGML tagging allows users to perform electronic searches, and
the editors to exert stylistic control over issues such as typography and layout. Users cannot see
the SGML tags in the Web-accessible form of these editions; however the electronic files from
which these editions are generated at the NYPL server contain items enclosed in angle brackets,
such as <title>The Hazely Family</title>. These tags express to the computer the structural
divisions of literary documents, such as the title page, chapter, embedded letter, poem, couplet
and stanza. Issues of content, such as publishing dates, addressees and signatories of letters, are
similarly expressed. These tags allow the computer to differentiate between one character of text
and another. Without SGML encoding, these editions would present text alone, rather than an
informed interpretation of the empirically discernable hierarchical and content-related
characteristics of the physical book.
These texts have been encoded in accordance to the TeiLite DTD (Document Type
SGML and have been published with Inso DynaWeb electronic publishing software. Through the
SGML markup, the electronic version represents documentary form as well as some content.
While the editors have made no attempt to create a facsimile volume in the digital editions, we
have recorded and made accessible formal structures of the text such as letters and verse, and the
names of authors and publishers. Images of original illustrations, frontispieces, bindings and title
pages, when available, appear along with the electronic text.
The purposes of such sophisticated document preparation are several and complimentary.
Encoding the formal attributes of the physical volume preserves the original object in a
descriptive form. Researchers can search the collection for structural units such as verse or
letters. Scholars and students can download and manipulate the electronic text with SGML or
other markup languages in order to emphasize or expose selected characteristics of the literary
work in the form of an electronic critical edition. Perhaps most attractively of all, these
out-of-print volumes can be distributed worldwide to buttress their presence as significant
on the World Wide Web.
The Dynaweb SGML publishing software the Library has used to mount these texts in World
Wide Web-accessible format converts the server-stored SGML electronic text files into an
HTML document each time that the user retrieves them. Thus what looks like simple HTML
encoded text is accompanied by the richness of content definition and collection-wide
contextualization that SGML can offer.
An Electronic Birth Certificate
The TEI (Text Encoding Iniative) header, which operates like an "electronic birth certificate" of
text, constitutes an important part of the Digital Schomburg Editons. Identification and creation
issues, such as the transcription of the text to electronic format, the print source, and minor
changes made by encoding staff and editors, are all recorded in this section for the future life of
the text. The header also provides various date and keyword search terms which are helpful to
researchers. From this portion of the electronic text our cataloguers create the CATNYP record.
The TEI header that we use contains four sections:
1.The File Description <fileDesc> contains a full bibliographical description of the
computer file, including the title, author, creator of the electronic version, publisher of electronic
version, and size of completed file, in KB. Within this section is information about the printed
2.The Encoding Description <encodingDesc> describes the standards under which
text was adapted to electronic publication.
3.The Text Profile Description <profileDesc> conveys non-bibliographic aspects of
text, such as the languages used. Within this section a <keywords> tag earmarks important
search terms such as: "fiction" or "non-fiction", or: "drama" ; "prose" ; "poetry" or "verse."
4.The Revision History Description -- <revisionDesc>: allows present and future
encoders to provide a history of changes made during the development of the electronic text and subsequent corrections.
This field specifically records changes to the text which have been made to accommodate the
Digital Schomburg Edition.