Parameters & coding

HTML Corpus Description

The HTML corpus is accessible through the file corpus.html which provides an index to the texts by Short Title.

Each text is contained in a separate HTML file. The Short Title and Short Short Title as well as Cameron number, an alphanumeric code that consists of one letter followed by a number or numbers, identifying the specific text, is in the header. Full bibliographic material, encoding, and other miscellaneous information is also found in the header. Each citation starts with a citation number and text reference identifier.

The format of the Text Reference Identifier varies from text to text, and the user must consult the header of the text to determine which system is being used.

Latin or Greek included in the Old English texts and Latin glossed by Old English are rendered in italics. Words which are fragmentary in manuscript or emended by the editor of the text are enclosed by '< >'. This may also indicate that there is a problem with the manuscript in the space adjacent to the word. Editorial punctuation has usually been adopted; for most texts it follows modern norms. Text that is originally in runic script is enclosed in double slashes '//'.

The special characters have been optimized to be viewed in 11 point characters with Firefox.

XML Corpus Description

The XML corpus is approximately 49 Megabytes in size and conforms to the TEI-P5 Guidelines in C.M. Sperberg-McQueen and Lou Burnard, eds., TEI P5: Guidelines for Electronic Text Encoding and Interchange (TEI Consortium 2008) – http://www.tei-c.org/Guidelines/P5/

The texts are ordered by Cameron number, an alphanumeric code that consists of one letter followed by a number or numbers, identifying the specific text.