Basic structure

(Source: the PPCME2 corpus description.)

The PPCME2 text samples are based largely on the Middle English section of the Diachronic Part of the Helsinki Corpus of English Texts, with certain additions and deletions. However, the size of the samples is considerably larger. For the earliest Helsinki time period, all texts are exhaustively sampled. For later Helsinki time periods, two texts per period were expanded to 50,000 words. The remaining texts are represented by the Helsinki Corpus sample.

The main Helsinki time periods are M1–M4, each covering approximately one hundred years:

  • Period M1 (1150–1250)
  • Period M2 (1250–1350)
  • Period M3 (1350–1420)
  • Period M4 (1420–1500).

In addition, texts originally written in a given period but for which the earliest manuscript is from a later period are given two digit period designations. Table 1 below lists all Helsinki periods as they appear in the corpus file names.

For lists of texts by date, dialect, date and dialect, genre and name, see text classification at the PPCME2 website.

Word counts

Table 1. Helsinki Corpus time periods and the number of words by period in the PPCME2.

Period designation Composition date Manuscript date Word count
MX1 unknown 1150–1250 62,596
M1 1150–1250 1150–1250 195,494
M2 1250–1350 1250–1350 93,999
M23 1250–1350 1350–1420 17,013
M24 1250–1350 1420–1500 35,591
M3 1350–1420 1350–1420 385,994
M34 1350–1420 1420–1500 99,994
MX4 unknown 1420–1500 5,168
M4 1420–1500 1420–1500 260,116
Total 1,155,965

Number of words by Helsinki Corpus time period in the PPCME2.

Figure 1. Number of words by Helsinki Corpus time period in the PPCME2.