About the WeRelate Transcript of the Genealogical Dictionary of the First Settlers of New England



Origin and Copyright

The original Genealogical Dictionary of the First Settlers of New England was published in four separate volumes between 1860 and 1862. Each volume contained appended additions and corrections. Further additions and corrections appeared in the New England Historical and Genealogical Register (NEHGR), (Vol. XXVII, No. 2, April, 1873, pp. 135-139). That content is obviously out of copyright.

The content presented starting here however, comes from a corrected electronic transcript prepared by Dr. Robert A. Kraft and Benjamin Dunning in 1994. Dr Kraft is the Berg Professor of Religious Studies, Emeritus, in the School of Arts and Sciences at the University of Pennsylvania. The transcript appears here by his very kind permission. The copyright page of the transcript indicates as follows:

"The electronic version has been adapted under the direction of Robert Kraft (assisted by Benjamin Dunning) from materials supplied by Automated Archives, 1160 South State, Suite 250, Orem UT 84058 in the following ways: missing lines have been added wherever they could be located (vol. 2 could not easily be checked since line format was not replicated; the corrections found in vols 1-4 have been integrated into the text; page numbers have been represented between double brackets; hyphens have been resolved, and some abbreviated names. NOTE that letter by letter verification has NOT yet been attempted. Copyright for the new electronic version by Robert Kraft, July 1994."

Creation of the WeRelate Version

Dr. Kraft's transcript exists in the form of four separate ASCII files, corresponding to the volumes of the original work. For presentation on WeRelate wiki pages, the content has been separated back into pages corresponding to those of the original work. This partitioning has a number of benefits.

  • The web and original publication page paradigms are consistent, simplifying comparison.
  • The page field of a source citation can both present a useful link to the transcript and a proper page reference to the actual published work.
  • The standard talk/discussion page, backing every media wiki page, provides a handy place to discuss issues related to particular pages of the transcription such as errors discovered after publication of the April 1873 corrections in NEHGR.
  • Pages of the transcript are tagged with places to which they refer, surnames of the people described, and the year range of events. Searches on those attributes then, stand a fair chance of returning relevant specific pages of the transcript.
  • The contents page, which appears on the root page of the transcript, is not - itself - part of the transcript. Instead, it was created by a program that searches over all the individual Transcript pages - recording appearances of the section start template.

Transcript Creation Defects

Most of the initial steps to create the page partitioning of Kraft's transcript, creating links that support "See xxxxx" links, as well as the initial values for places, surnames, and the date range - were performed by a simple text processing program. While quite effective, the implementation was not without flaws:

  • Surname sections were recognized by hallmarks that were typical, but not universal. Additional sections not recognized will have to be edited into the index by hand, with appropriate changes to page headers and addition of the proper section at the appropriate location in the transcript.
  • A bug in the processing of Surname sections sometimes doubles the surname so that it appears as LASTLAST.
  • The '£' character (among other special characters) did not translate correctly and appears as '�'.
  • The '£' character can be recreated (on most Windows keyboards) by holding down the -Alt- key and typing '163' on the numeric keypad, almost always on the right side of the keyboard.
  • The "See surname" idiom was replaced correctly with an active link only when it appeared immediately after a section name. When the phrase appears in the normal stream of text no replacement occurred. Use the {{savagepage|vol|page|See surname}} template to create the link.
  • Only a small number of common early New England place names (about 60) were known to the program, and only a few names in England. Places other than those will not be included in the initial place list for a transcript page.
  • Discrepancies in the transcription - such as spelling Billerica as Bi11erica (use of arabic numeral '1' instead of lower case letter 'l') - would not be recognized as Billerica, Massachusetts. Similarly, dates such as l777 (instead of 1777) would not be recognized as a date and would not contribute to the date range for a page. Related to these are defects that seem to relate to use of optical character recognition. For example, the letter g sometimes recognized as the digit nine. Errors of this type - while essentially "baked in" to the Kraft transcript, are still defects in the faithful recreation of the text that Kraft intended.
  • Savage makes use of the asterisk ('*') to designate members of the Massachusetts General Court. From time to time, these appear as the first character of a line of the transcription - where they are inadvertently interpreted as requesting a media-wiki bullet item.

It is always appropriate to modify the transcript appearance in order to resolve defects of these types.

Common Practices

  • The simplest and most important use of the transcript is as a linkable target for page citations. A template has been created to make this very simple. For example, Thomas Adams is discussed in volume 1, page 16 of Savage. The page field of the Savage citation for Thomas Adams contains the template {{savagepg|1|16}} -> 1:16.
  • The transcript is meant to be annotated with links to any WeRelate Person page to which it unambiguously refers (for example, see volume 1, page 16).
  • The transcript narrative also embeds mention of important reference works upon which Savage relied. When recognized, these should be turned into active links to appropriate WeRelate "Source" pages.
  • Normal wiki practice is to create a link only the first time a term (or person) is seen in some context. We depart from that rule for Savage, since those not familiar with Savage can easily be confused about "who" is meant in the narrative. Instead, consider it proper to link every unambiguous reference.
  • At present, other terms - such as place names in the body of the transcript - should not be linked (so as to allow linked names of individuals to stand out from the page clearly).
  • The transcript appearance should not change (in general) from that of the original Kraft ASCII files - or more particularly - from what we can properly infer as the intentions of Dr. Kraft. Faithfully maintain line breaks, punctuation, capitalization and retained abbreviations of the original text (as modified by application of the additions and errata). Repair of typographical, transcription, or WeRelate pre-processing errors (see above), is highly desireable.
  • The content of Savage consists of surname-specific sections. Those sections are designated in the WeRelate transcript by Template:Savagetranscriptsection, which provides specific formatting for the situation. Those sections further consist of sketches that are associated with the family of one particular individual. In the original Savage, the only visual indicator of the start of a new sketch was appearance of the given name in all capitals. To support software analysis, and gor enhanced readability of the transcript, Template:Savagetranscriptsketch was created to objectively designate the beginning of a new sketch - both in the raw content and visually in the WeRelate presentation.
  • References to wars, battles, particular immigration voyages and other events - which are associated with a WeRelate Category - can be linked to that category by use of the Category template. The template takes three parameters as follows: category, link text and originating page (where page is always four digits of the form <v><ppp>, with the page number being zero-padded if less than 100). The template prepends the string "^Savage" to the page number, so that transcript pages will sort to the end of any particular category.

When Savage Got it Wrong

  • Defective, but properly transcribed, content should not be corrected or discussed on the transcript narrative pages proper. Instead, use Template:Savagetranscriptdefect to designate erroneous content along with an index that indicates which defect this is on that page. The template will create the erroneous content with a strike through and place an active link at the end of the content to the corresponding discussion page and section named "Defect number".
  • Discussion and supporting research for WHY the designated content should be summarized in the appropriate section of the talk page. See examples of this practice on pages 58 and 227 of volume 4. Notice on page 227, that two separate sections of narrative may be associated with the same error index. This is appropriate if there is indeed ONE error associated with both defects. Separate defects on a single page should be assigned simple increasing integer labels.

Survey of the Nuts and Bolts

The only direct wiki formatting operation that should be found in the transcript source is the line break, <br>. All the other formatting operations are controlled by the following templates:

Supporting templates, used with transcript content other than in the transcript source proper, include:

Where We Stand

The public phase of this project began in February of 2012, when 2343 (v1-4 - 497, 577, 596, 673) pages of body content, extracted from the Kraft files, was uploaded. A few statistics to this date:

  • Of 22584 sketches in Savage, the subjects of 2919 are linked to a corresponding Person page
  • Of 2343 pages in the transcript, 1473 contain a link to at least one Person page.
  • The 14575 links from Transcript to Person pages refer to 9632 unique Person pages.
  • Of 2601 Person pages that reference Savage, 2102 link specific Transcript pages via this template.
  • 262 "See also" links (annotation from one Transcript page to another, where Savage wrote "see also" or similar).
  • 138 locations indicated with the defect template, where Savage is believed to have been in error.
  • 78 links to 14 different categories, for example the Mayflower Passengers and Salem Witch Trials.
  • The most often linked person, Theophilus Eaton, from 22 locations.
  • Most extensively annotated transcript page, Volume 4, page 299, with 88 links.