Coding conventions
The corpus is available as a single text file, orford.txt [1.6 Mb] or as three linked HTML files, orford1.htm, orford2.htm and orford3.htm [approx. 800-910 Kb each]. (An XML version may be on the way.) The text version is coded in a similar way to the Helsinki Corpus, whereas the HTML version aims for greater readability, following the general convention that red signifies conjectural, crossed-out or illegible text, while blue indicates editorial material.
meaning |
text version |
HTML version |
writer's name |
<A XXXX> |
author XXXX |
date (year) |
<O nnnn> |
nnnn |
new page |
<P> |
new page |
new page with indication to turn from foot of previous page |
<P Turn over/turn/Please to turn> |
new page Turn over |
new page with word repeated at foot of previous page |
<P xxxx> |
new page xxxx |
underline |
(_xxxx_) |
xxxx |
superscript |
yy=xxxx=yy |
yyxxxxyy |
subscript |
yy=xxxx=yy (+ note indicating subscript) |
yyxxxxyy |
interlineation |
yy^xxxx^yy |
yy^xxxx^yy |
deviant word division e.g. "a fore" for "afore" |
a%fore |
a_fore |
deviant word joining e.g. "Iam" for "I am" |
I %am |
I %am |
abbreviation indicated by author |
~ |
~ |
conjectural reading |
{xxxx} |
xxxx |
crossing out or rubbing out
|
[^ "XXXX" crossed/rubbed out^] |
XXXX |
crossing out or rubbing out with some part uncertain
|
[^ "YYXXXXYY" crossed/rubbed out?^]
or [^ "YY{XXXX}YY" crossed/rubbed out^] |
YY{XXXX}YY or YYXXXXYY |
illegible (number of asterisks indicates approx number of letters) |
{**} |
{**} |
illegible (see below) |
{*...} |
{*...} |
our comment
|
[^xxxx^] |
[xxxx] |
For illegible text the number of asterisks estimates number of letters where possible; otherwise {*...} is used. If there is a specific cause for the illegibility, this is specified in a comment.
The HTML version does not distinguish crossing out from rubbing out, and for readability it ignores any uncertainty in the words crossed or rubbed out. In such cases the text version preserves more detailed information. The comment [corrected] indicates a correction made in the original letter by the author. It is often difficult to determine whether a letter-form is upper or lower case though the letter itself is not in doubt: we have not marked such readings as tentative and have merely tried to be reasonably consistent. You will therefore need to consult the original documents if capitalisation is of particular importance to you.
Hyphenation at the end of a line has not been preserved and the whole of the word has been put at the end of the first line, unless there is a reasonable chance that the hyphen belongs to the word form. Otherwise, lineation has been preserved, except in some cases where the text runs parallel. The remainder of the layout has largely been ignored.
Accounts or calculations – the terms are used interchangeably – are sometimes omitted, but if so, this is noted in the text. London postmarks are not generally noted.
|