Helsinki Corpus of English Texts

The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic features in long diachrony. It can be used as a diagnostic corpus giving general information of the occurrence of forms, structures and lexemes in different periods of English. This information can be supplemented by evidence yielded by more special and focused historical corpora.

For information on the XML version of the Helsinki Corpus, click here.

Project leader: Matti Rissanen, University of Helsinki
Project secretary: Merja Kytö, Uppsala University
Time of compilation: 1984–1991
Size: 1,572,800 words
Language: English (Old, Middle, Early Modern)
Number of texts/samples: c. 450
Period: c. 730–1710
Released: 1991
Funding: The University of Helsinki; The Academy of Finland

Reference lines and copyright

The Helsinki Corpus of English Texts (1991). Department of Modern Languages, University of Helsinki. Compiled by Matti Rissanen (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English).

TEI XML edition:

Helsinki Corpus TEI XML Edition. 2011. First edition. Designed by Alpo Honkapohja, Samuli Kaislaniemi, Henri Kauhanen, Matti Kilpiö, Ville Marttila, Terttu Nevalainen, Arja Nurmi, Matti Rissanen and Jukka Tyrkkö. Implemented by Henri Kauhanen and Ville Marttila. Based on The Helsinki Corpus of English Texts (1991). Helsinki: The Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki.


Kytö, Merja (comp.), Manual to the Diachronic Part of The Helsinki Corpus of English Texts: Coding Conventions and Lists of Source Texts (3rd ed. 1996).

Marttila, Ville. 2011. Helsinki Corpus TEI XML Edition Documentation. Helsinki: VARIENG.


Project leader: Matti Rissanen
Project secretary: Merja Kytö
Old English
: Leena Kahlas-Tarkka, Matti Kilpiö
Middle English
: Saara Nevanlinna, Päivi Pahta, Kirsti Peitsara, Irma Taavitsainen
Early Modern English
: Terttu Nevalainen, Helena Raumolin-Brunberg

Other team members

Old English: Ilkka Mönkkönen, Aune Österman
Middle English
: Inkeri Blomstedt, Juha Hannula, Mailis Järviö, Leena Koskinen, Tesma Outakoski
Early Modern English: Ritva Tiusanen

Student assistants

Kirsi Heikkonen, Juhani Klemola, Asta Kuusinen,Tuula Lehtonen, Tom Löfström, Arja Nurmi, Minna Palander, Tiina Selki, Päivi Öhman.

File format

The coding system is based on the set of ASCII codes (96 printable characters). The names of the 242 files follow MS-DOS conventions, limiting available characters to eight. Each file name begins with the character C (for `Corpus'), followed by O (for `Old English'), M (for `Middle English' or E (for `Early Modern English'). The file names reflect, by and large, the names of authors or texts in Old and Middle English sections of the Corpus. In the Early Modern English section the file names are based on the systematic coverage of different text types.



The Oxford Text Archive

TEI XML edition:

Associated projects

The Helsinki Corpus of Older Scots (Anneli Meurman-Solin; also see the article in ICAME 19)

The Corpus of Early American English (Merja Kytö; in preparation)

The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English

The Penn-Helsinki Parsed Corpus of Middle English (PPCME)

The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)