The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)

The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) is a syntactically annotated corpus of prose text samples. Its syntactic annotation (parsing) permits searching, not only for words and word sequences, but also for syntactic structure. The corpus is designed for the use of students and scholars of the history of English, especially the historical syntax of the language, and it is part of an ongoing larger project at the University of Pennsylvania and the University of York to produce syntactically annotated corpora for all stages of the history of English.

Project leader: Anthony Kroch
Time of compilation: 1999–2004
Size: c. 1,8 million words (1,794,010)
Language: Early Modern English
Number of texts/samples: 229
Period: 1500–1710
Released: 2004
Funding: National Endowment for the Humanities and the National Science Foundation
Project home page:

Reference lines and copyright

Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English.


Santorini, Beatrice. 2005. Annotation manual for the PPCME2, PPCEME, and PCEEC.


Professor Anthony Kroch and Dr Beatrice Santorini (University of Pennsylvania)

File format

Each text in the corpus comes in three different formats: text (.txt), part-of-speech (POS) tagged (.pos) and parsed (.psd).


The Penn Corpora are distributed with a search program CorpusSearch 2, written by Beth Randall, and released as open source software.


A CD-ROM may be ordered with the corpus order form.

Associated projects

Penn-Helsinki Parsed Corpus of Middle English, 2nd edition (PPCME2)
York-Helsinki Parsed Corpus of Old English Poetry
York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)
Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
Parsed Corpus of Early English Correspondence (PCEEC)
Penn Parsed Corpus of Modern British English (1700-1914)