Basic structure
(Source: the PPCME2 corpus description.)
The PPCME2 text samples are based largely on the Middle English section of the Diachronic Part of the Helsinki Corpus of English Texts, with certain additions and deletions. However, the size of the samples is considerably larger. For the earliest Helsinki time period, all texts are exhaustively sampled. For later Helsinki time periods, two texts per period were expanded to 50,000 words. The remaining texts are represented by the Helsinki Corpus sample.
The main Helsinki time periods are M1–M4, each covering approximately one hundred years:
- Period M1 (1150–1250)
- Period M2 (1250–1350)
- Period M3 (1350–1420)
- Period M4 (1420–1500).
In addition, texts originally written in a given period but for which the earliest manuscript is from a later period are given two digit period designations. Table 1 below lists all Helsinki periods as they appear in the corpus file names.
For lists of texts by date, dialect, date and dialect, genre and name, see text classification at the PPCME2 website.
Word counts
Table 1. Helsinki Corpus time periods and the number of words by period in the PPCME2.
Period designation |
Composition date |
Manuscript date |
Word count |
MX1 |
unknown |
1150–1250 |
62,596 |
M1 |
1150–1250 |
1150–1250 |
195,494 |
M2 |
1250–1350 |
1250–1350 |
93,999 |
M23 |
1250–1350 |
1350–1420 |
17,013 |
M24 |
1250–1350 |
1420–1500 |
35,591 |
M3 |
1350–1420 |
1350–1420 |
385,994 |
M34 |
1350–1420 |
1420–1500 |
99,994 |
MX4 |
unknown |
1420–1500 |
5,168 |
M4 |
1420–1500 |
1420–1500 |
260,116 |
Total |
|
|
1,155,965 |
Figure 1. Number of words by Helsinki Corpus time period in the PPCME2.
|