Early-Modern Multiloquent Authors (EMMA)

EMMA is a sample of 50 of the most prolific English writers who were born in the 17th century, mostly taken from the London-based elite. The sheer size of EMMA makes it possible to trace grammatical developments in detail across single lifetimes as well as five different generations.

Project leader: Peter Petré (University of Antwerp)
Time of compilation: 2015–
Size: Phase I: 80 million words; Phase II: 13 million words (estimate)
Language: English
Number of texts/samples: Phase I: 11,230; Phase II: 1,800 (estimate)
Period: 1623–1757
Released: in preparation
Funding: H2020 – European Research Council (ERC) (Project ID 639008)
Project home page: https://www.uantwerpen.be/mind-bending-grammars/

Manual

There is no manual currently. A manual will be provided with the first official release in the second half of 2017.

Compilers

Principal Investigator: Peter Petré
Senior Research Fellow: Oscar Strik
Researchers: Lynn Anthonissen, Sara Budts, Enrique Manjavacas, William Standing
Digitalization staff: Emma-Louise Silva
Volunteers: Maria De Graef, Lutgarde De Haeck, Diane Koek, BA & MA students from the University of Antwerp

Availability

The part of EMMA that is in the public domain will be made available for download in the second half of 2017. The remainder of EMMA will be made available once the source texts have entered the public domain (around 2021).

Technical information

The corpus is tokenized and encoded in Unicode UTF-8. It comes in XML and plain TXT formats. The open source software CosyCat (Collaborative Synchronized Corpus Analysis Toolkit) has been developed for querying and annotating the corpus (currently in alpha). CosyCat queries a version of EMMA that is indexed by BlackLab.

Rich metadata for the corpus is stored partly as xml-headers (metadata specific to the texts), partly as a json-database (metadata specific to the authors).