Basic structure
(Source: the PPCEME corpus description.)
The PPCEME is divided into three subcorpora.
- The Helsinki directories, consisting of roughly 573,000 words, contain the Helsinki Corpus in parsed, POS-tagged, and unannotated form.
- The Penn1 directories, consisting of roughly 615,000 words, contain a first supplement to the Helsinki Corpus. As far as possible, we have used material by the same authors and from the same editions as the material in the Helsinki Corpus. Where necessary (where the Helsinki Corpus contains an exhaustive sample of a text), we have added new material.
- The Penn2 directories, consisting of roughly 606,000 words, contain a second supplement to the Helsinki Corpus. Again, we have tried to use material by the same authors and from the same editions as the material in the Helsinki Corpus. However, the Penn2 directories contain more new material than the Penn1 directories.
Word counts
Table 1. Word count summary by time period and subcorpus.
Period |
Helsinki |
Penn 1 |
Penn 2 |
Total |
E1 1500-1569 |
196754 |
194018 |
185423 |
576195 |
E2 1570-1639 |
196742 |
223064 |
232993 |
652799 |
E3 1640-1710 |
179477 |
197908 |
187631 |
565016 |
Total |
572973 |
614990 |
606047 |
1794010 |
Figure 1. Word count summary by time period and subcorpus.
Table 2. Word count summary by text genre.
Text genre |
Number of words |
Percentage |
Bible |
134275 |
7,7 % |
Travelogue |
125337 |
7,2 % |
Diary, private |
123106 |
7,0 % |
Drama, comedy |
120428 |
6,9 % |
Letters, private |
116915 |
6,7 % |
Fiction |
116494 |
6,7 % |
Law |
115863 |
6,6 % |
Educational treatise |
113032 |
6,5 % |
Handbook, other |
112419 |
6,4 % |
History |
108706 |
6,2 % |
Proceedings, trials |
105090 |
6,0 % |
Sermon |
97400 |
5,6 % |
Philosophy |
85107 |
4,9 % |
Science, other |
79050 |
4,5 % |
Letters, non-private |
59868 |
3,4 % |
Biography, other |
52755 |
3,0 % |
Science, medicine |
41786 |
2,4 % |
Biography, autobiography |
41379 |
2,4 % |
Total |
1749010 |
100,0 % |
Figure 2. Word count summary by text genre.
|
|
|