Basic structure of EMMA

Early-Modern Multiloquent Authors consists of 50 carefully selected authors across 5 generations. The following graph gives an overview of the corpus distribution.

The EMMA Corpus

Selection criteria included

  • prolific writers: min. 500,000 words/author
  • career & distribution: long career with sufficient material across career stages
  • London-based elite
  • social network information: connections within and across generations
    • religion: Church of England vs Non-Conformists; Quakers
    • politics: Royalists vs Parliamentarians
    • professions: clergy, politicians, dramatists/authors, philosophers, scientists

Genre balance was not a primary criterion. However, the corpus contains considerable amounts of text from the predominant written genres of the 17th century. The following is a table of those genres that are represented by at least 50,000 words in every generation for the first four generations in EMMA phase I.

Generation 1 Generation 2 Generation 3 Generation 4
biography 261633 181488 555232 142332
dialogue 445454 71760 350166 513010
drama 502288 719778 865389 700749
fiction 125605 484891 308529 400477
letters 1106451 475776 1297755 1034385
poetry 585063 286585 184920 341774
prose 23425330 4545774 13417412 2588806
science 1359196 1518364 2645128 212332
sermons 1596864 1579549 3023848 1348144
other genres 4811100 1217321 1152164 2844147

A more detailed description of the corpus is being prepared for submission to the ICAME journal.