The Málaga Corpus of Early Modern English Scientific Prose (MCEModESP)

MCEModESP is the second component of The Málaga Corpus of Early English Scientific Prose. The corpus covers the period 1500–1700 and like its predecessor, it contains hitherto unedited medical material from the three main branches of medical writing: theoretical treatises, surgical treatises and recipe collections. When used together with MCLMESP, it becomes an excellent source for the study of language change and variation from a diachronic viewpoint.

Project leader: Javier Calle Martín
Time of compilation: 2016–2020
Size: c. 1.5 million words
Language: Early Modern English
Period: 1500–1700
Project home page: https://modernmss.uma.es

Compilers

Javier Calle-Martín, University of Málaga
David Moreno-Olalla, University of Málaga
Laura Esteban-Segura, University of Málaga
Miriam Criado-Peña, University of Málaga
Juan Camilo Conde-Silvestre, University of Murcia
Teresa Marqués-Aguado, University of Murcia
Santiago González Fernández-Corugedo, University of Oviedo
Graham D. Caie, University of Glasgow
Jacob Thaisen, University of Oslo
Hanna Rutkowska, Adam Mickiewicz University
Jesús Romero-Barranco, University of Granada

Technical information

The corpus is provided in three different versions: plain text, normalised and POS-tagged, and it is also available online as a CQP-web format.

The corpus has been compiled in different stages. During each of these stages, a different version of the corpus has been produced, which will eventually serve different research purposes:

  1. Plain text corpus (.txt): these files contain the semi-diplomatic transcription of the treatises included in the corpus, where original spelling and word division have been preserved. Additionally, these transcriptions are available, together with the digitised images of the original manuscripts, at the project’s website.

    Take thyme seedes rosemary seedes parsley seede the middle rinde of the walnut tree gromell seede saxifrage seede the kernells of the eglantine berryes

  2. Normalised corpus (.norm): these files contain the normalised transcriptions of the treatises included in the corpus. The normalisation process has been carried out by means of VARD, which standardises the variant forms to Present-Day English and inserts an XML-tag so that the original word can be consulted.

    Take thyme seeds rosemary seeds parsley seed the middle rind of the walnut tree gromwell seed saxifrage seed the kernels of the eglantine berries

  3. POS-tagged corpus (.pos): these files contain the POS-tagged version of the corpus, which has been carried out by way of CLAWS, which assigns a morpho-syntactic tag to each word in the corpus, punctuation marks included. The C7 tagset has been employed.

    Take_VV0 thyme_NN1 seeds_NN2 rosemary_NN1 seeds_NN2 parsley_NN1 seed_NN1 the_AT middle_JJ rind_NN1 of_IO the_AT walnut_NN1 tree_NN1 gromwell_NN1 seed_NN1 saxifrage_NN1 seed_NN1 the_AT kernels_NN2 of_IO the_AT eglantine_JJ berries_NN2

    Please note that (.NORM) and (.POS) files cannot be used to quote original examples of the corpus, as they contain normalised versions of the texts.

  4. CQPweb format: the corpus is also available online as a CQP-web format at https://modernmss.uma.es/cqpweb