The Penn-Helsinki Parsed Corpus of Modern British English (PPCMBE)

The Penn Parsed Corpus of Modern British English, consisting of just under one million words, is part of an ongoing larger project at the University of Pennsylvania and the University of York to produce syntactically annotated corpora for all stages of the history of English. The genre composition of the corpus has been kept as close as possible to that of the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME).

Project leader: Anthony Kroch
Size: c. 1 million words (948,895)
Language: Modern British English
Number of texts/samples: 101
Period: 1700–1914
Released: 2010
Funding: The National Science Foundation
Project home page: http://www.ling.upenn.edu/hist-corpora/

Reference lines and copyright

Kroch, Anthony, Beatrice Santorini and Ariel Diertani. 2010. Penn Parsed Corpus of Modern British English. http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html

Manual

Santorini, Beatrice. 2010. Annotation manual for the Penn Historical Corpora and the PCEEC. Release 2. http://www.ling.upenn.edu/hist-corpora/annotation/index.htm

Compilers

Professor Anthony Kroch, Dr Beatrice Santorini and Ariel Diertani (University of Pennsylvania)

File format

Each text in the corpus comes in three different formats: text (.txt), part-of-speech (POS) tagged (.pos) and parsed (.psd).

Software

The Penn Corpora are distributed with a search program CorpusSearch 2, written by Beth Randall, and released as open source software.

Availability

A CD-ROM may be purchased with the corpus order form.

Associated projects

The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)
Penn-Helsinki Parsed Corpus of Middle English, 2nd edition (PPCME2)
York-Helsinki Parsed Corpus of Old English Poetry
York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)
Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
Parsed Corpus of Early English Correspondence (PCEEC)