The Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2)

The Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2) is a syntactically annotated corpus of prose text samples. Its syntactic annotation (parsing) permits searching, not only for words and word sequences, but also for syntactic structure. The corpus is designed for the use of students and scholars of the history of English, especially the historical syntax of the language, and it is part of an ongoing larger project at the University of Pennsylvania and the University of York to produce syntactically annotated corpora for all stages of the history of English.

Project leader: Anthony Kroch
Time of compilation: 1990–2000
Size: c. 1,2 million words (1,155,965)
Language: Middle English
Number of texts/samples: 55
Period: 1150–1500
Released: 2000
Funding: National Science Foundation, with supplementary support from the University of Pennsylvania Research Foundation
Project home page:

Reference lines and copyright

Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition.


Santorini, Beatrice. 2005. Annotation manual for the PPCME2, PPCEME, and PCEEC.


Professor Anthony Kroch (University of Pennsylvania) and Dr Ann Taylor (University of York)

File format

Each text in the corpus comes in three different formats: text (.txt), part-of-speech (POS) tagged (.pos) and parsed (.psd). In addition, there is a file with philological and bibliographical information about each text.


The Penn Corpora are distributed with a search program CorpusSearch 2, written by Beth Randall, and released as open source software.


A CD-ROM may be ordered with the corpus order form.

Please note that the first edition of PPCME is no longer being distributed.

Associated projects

Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)
York-Helsinki Parsed Corpus of Old English Poetry
York-Toronto-Helsinki Parsed Corpus of Old English Prose
Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
Parsed Corpus of Early English Correspondence (PCEEC)
Penn Parsed Corpus of Modern British English (1700-1914)