Coding conventions

The corpus is available as a single text file, orford.txt [1.6 Mb] or as three linked HTML files, orford1.htm, orford2.htm and orford3.htm [approx. 800-910 Kb each]. (An XML version may be on the way.) The text version is coded in a similar way to the Helsinki Corpus, whereas the HTML version aims for greater readability, following the general convention that red signifies conjectural, crossed-out or illegible text, while blue indicates editorial material.

meaning text version HTML version
writer's name <A XXXX> author XXXX
date (year) <O nnnn> nnnn
new page <P> new page
new page with indication to
turn from foot of previous page
<P    Turn over/turn/Please to turn> new page   Turn over
new page with word repeated
at foot of previous page
<P    xxxx> new page   xxxx
underline (_xxxx_) xxxx
superscript yy=xxxx=yy yyxxxxyy
subscript yy=xxxx=yy (+ note indicating subscript) yyxxxxyy
interlineation yy^xxxx^yy yy^xxxx^yy
deviant word division
e.g. "a fore" for "afore"
a%fore a_fore
deviant word joining
e.g. "Iam" for "I am"
I %am I %am
abbreviation indicated by author ~ ~
conjectural reading {xxxx} xxxx
crossing out or rubbing out
[^ "XXXX" crossed/rubbed out^] XXXX
crossing out or rubbing out
with some part uncertain
[^ "YYXXXXYY" crossed/rubbed out?^]
or [^ "YY{XXXX}YY" crossed/rubbed out^]
YY{XXXX}YY
or YYXXXXYY
illegible (number of asterisks
indicates approx number of letters)
{**} {**}
illegible (see below) {*...} {*...}
our comment
[^xxxx^] [xxxx]

For illegible text the number of asterisks estimates number of letters where possible; otherwise {*...} is used. If there is a specific cause for the illegibility, this is specified in a comment.

The HTML version does not distinguish crossing out from rubbing out, and for readability it ignores any uncertainty in the words crossed or rubbed out. In such cases the text version preserves more detailed information. The comment [corrected] indicates a correction made in the original letter by the author. It is often difficult to determine whether a letter-form is upper or lower case though the letter itself is not in doubt: we have not marked such readings as tentative and have merely tried to be reasonably consistent. You will therefore need to consult the original documents if capitalisation is of particular importance to you.

Hyphenation at the end of a line has not been preserved and the whole of the word has been put at the end of the first line, unless there is a reasonable chance that the hyphen belongs to the word form. Otherwise, lineation has been preserved, except in some cases where the text runs parallel. The remainder of the layout has largely been ignored.

Accounts or calculations – the terms are used interchangeably – are sometimes omitted, but if so, this is noted in the text. London postmarks are not generally noted.