Compilation principles

The Corpus of Early English Correspondence was originally compiled to test the applicability of sociolinguistic methods to historical data. The compilation principles of the corpus are in accordance with this aim, as judgement samples were selected on extralinguistic criteria to cover a broad range of the language. The result can be described as a balanced corpus with as full a social and quantitative coverage as can be recovered from edited collections of early English letters.

For social coverage, letters were systematically selected from men and women, young and old, and different social strata. One of the biggest difficulties in attaining a full social coverage was the low level of literacy among the Renaissance women and lower ranks, so the corpus includes fewer letters written by them than by gentlemen. Similarly, a full regional coverage was not a feasible goal, as most speakers of rural dialects belonged to the illiterate majority. Thus, the sampling focused on four regions: London, East Anglia , North, and the Court, which means the royal family, courtiers, diplomats and high administrative officers, who often lived in Westminster and travelled widely. Attention was also paid to the contents of the letters, such as news, love, family matters or business, and to the relationships between the writers, such as family, non-family, family servant or close friend.

Quantitative coverage was also a high priority, as one letter does not usually contain enough occurrences of morphosyntactic variables for statistical analysis. Therefore, at least a minimum of ten letters was chosen from one informant whenever possible, but there was no fixed quota of text per writer. The scarcity of material by women and ranks below the gentry was made up by including more than ten letters from women informants if available or by including even single letters from the representatives of these groups.

Finally, the quality of the editions as well as the authorship of the letters were important compilation principles. Only original-spelling editions were used, although modernized punctuation and capitalization as well as expanded abbreviations were accepted. The best editions systematically state the editorial principles, but the older ones in particular do not always contain this information. During the compilation process, the corpus team checked many of the editions against the original manuscripts and corrected them accordingly. Autograph letters were chosen whenever possible, but drafts or copies written by the sender were also regarded as a good source. These quality standards were lowered only if this was the only way to guarantee some representativeness for a particular period or informant group.

More information on the compilation principles can be found in:

Nevalainen, Terttu & Helena Raumolin-Brunberg. 1996. The Corpus of Early English Correspondence . In Sociolinguistics and Language History. Studies based on the Corpus of Early English Correspondence, ed. by Terttu Nevalainen & Helena Raumolin-Brunberg, 39-54. Amsterdam & Atlanta: Rodopi.

Raumolin-Brunberg, Helena & Terttu Nevalainen. 2007. Historical sociolinguistics: The Corpus of Early English Correspondence. In Creating and Digitizing Language Corpora: Diachronic Databases. Volume 2, ed. by Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl, 148-171. Houndmills: Palgrave Macmillan.