Corpus |
Reference line |
ALEC |
Advanced Learner English Corpus |
Advanced Learner English Corpus (ALEC). Compiled by Tove Larsson. Uppsala University. |
APU |
APU Writing and Reading Corpus 1979-1988 |
APU Writing and Reading Corpus 1979-1988 (APU). Compiled by Nuria Yáñez-Bouza (University of Vigo, Spain) and Victorina González-Díaz (University of Liverpool, UK). |
ARCHER |
A Representative Corpus of Historical English Registers |
Publications making use of ARCHER shall include a reference to the name of the corpus, the years of compilation, and the compiler team. A suitable bibliographic listing is as follows (with ‘x’ replaced as appropriate):
ARCHER-X = A Representative Corpus of Historical English Registers version X. 1990–1993/2002/2007/2010/2013/2016. Originally compiled under the supervision of Douglas Biber and Edward Finegan at Northern Arizona University and University of Southern California; modified and expanded by subsequent members of a consortium of universities. Current member universities are Bamberg, Freiburg, Heidelberg, Helsinki, Lancaster, Leicester, Manchester, Michigan, Northern Arizona, Santiago de Compostela, Southern California, Trier, Uppsala, Zurich. Examples of usage taken from ARCHER were obtained under the terms of the ARCHER User Agreement.
We recommend that individual citations from ARCHER should include the text identifier (filename), e.g. “1722grah_s3b”. The ARCHER version used should be acknowledged with the citation or globally in the bibliography. For examples retrieved at the consortium departments this will be ARCHER 3.2 / 3.1 / 2 / 1, as appropriate; for examples retrieved from the online versions it will be ARCHER 3.2 (Lancaster) or ARCHER 3.2 (Zurich), as appropriate. |
BASE |
British Academic Spoken English Corpus |
British Academic Spoken English Corpus (BASE). The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson.
When referring to the BASE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Thompson, P. and Nesi, H. (2001) The British Academic Spoken English (BASE) Corpus Project. Language Teaching Research 5 (3) 263-264 |
BAWE |
British Academic Written English Corpus |
British Academic Written English Corpus (BAWE).
Use of the corpus is acknowledged using the following form of words: The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800).
When referring to the BAWE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Gardner, S. & Nesi, H. (2013) A classification of genre families in university student writing Applied Linguistics 34 (1) 1-29 or Nesi, H. & Gardner, S (2012) Genres across the Disciplines: Student writing in higher education. Cambridge University Press. |
B-BROWN |
The B-Brown-1931 Corpus |
The B-Brown-1931 Corpus (B-BROWN). Project leader Marianne Hundt. |
BE06 |
British English 2006 |
The British English 2006 corpus (BE06). Compiled by Paul Baker. |
BLOB-1931 |
The BLOB-1931 Corpus |
The BLOB-1931 Corpus (BLOB-1931). Project leaders: 1: Geoffrey Leech (University of Lancaster), 2: Paul Rayson (Lancaster University). |
BNC |
British National Corpus |
British National Corpus (BNC). A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island. |
BROWN |
The Brown corpus |
A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island. |
Buckeye |
Buckeye Corpus |
Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [www.buckeyecorpus.osu.edu] Columbus, OH: Department of Psychology, Ohio State University (Distributor). |
CASE |
Corpus of Academic Spoken English |
Corpus of Academic Spoken English (CASE).
Long citation:
CASE. Forthcoming. Corpus of Academic Spoken English. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Saarbrücken: Saarland University (coordination) / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain. [http://www.uni-saarland.de/index.php?id=48492] (date of last access).
Short citation:
CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. [http://www.uni-saarland.de/index.php?id=48492] (date of last access).
Single transcript citation:
05HE18FL52. CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. (please also cite CASE) |
CC |
Coruña Corpus of English Scientific Writing |
Coruña Corpus of English Scientific Writing (CC). Compiled by MUSTE Research Group. Project leader Isabel Moskowich.
Parapar López, Javier & Moskowich, Isabel. 2007. The Coruña Corpus Tool. Revista del Procesamiento de Lenguaje Natural, 39: 289–290. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/2690
Moskowich, Isabel & Crespo García, Begoña. 2007. Presenting the Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing. In Pérez Guerra, Javier et al. (eds.) ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English. Bern: Peter Lang (341–357).
Moskowich-Spiegel Fandiño, Isabel & Parapar López, Javier. 2008. Writing Science, Compiling Science. The Coruña Corpus of English Scientific Writing. In Lorenzo Modia, María Jesús (ed.) Proceedings from the 31st AEDEAN Conference. A Coruña: Universidade da Coruña (531–544).
Crespo García, Begoña & Isabel Moskowich. 2010. CETA in the Context of the Coruña Corpus. Literary and Linguistic Computing, 25(2): 153–164. doi:10.1093/llc/fqp038 |
CED |
A Corpus of English Dialogues 1560-1760 |
A Corpus of English Dialogues 1560-1760. 2006. Compiled under the supervision of Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University). |
CEEC |
Corpus of Early English Correspondence |
Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin at the Department of Modern Languages, University of Helsinki. |
CEECE |
Corpus of Early English Correspondence Extension |
Corpus of Early English Correspondence Extension. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio at the Department of Modern Languages, University of Helsinki. |
CEECES 1 |
Corpus of Early English Correspondence Extension Sampler, part 1 |
Corpus of Early English Correspondence Extension Sampler, part 1. 2021. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.4644243 |
CEECES 2 |
Corpus of Early English Correspondence Extension Sampler, part 2 |
Corpus of Early English Correspondence Extension Sampler, part 2. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.5887100 |
CEECS |
Corpus of Early English Correspondence Sampler |
Corpus of Early English Correspondence Sampler. 1998. Compiled by Jukka Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin and Helena Raumolin-Brunberg at the Department of English, University of Helsinki. |
CEECSU |
Corpus of Early English Correspondence Supplement |
Corpus of Early English Correspondence Supplement. Compiled by Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin, Helena Raumolin-Brunberg and Anni Sairio at the Department of English, University of Helsinki. |
CEEM |
Corpus of Early English Medical Writing |
See individual subcorpora |
CELiST |
A Corpus of English Life Sciences Texts |
A Corpus of English Life Sciences Texts (CELiST) is a sub-corpus of the Coruña Corpus of English Scientific Writing (CC). Forthcoming. Compiled by the MUSTE Research Group. |
CEPhiT |
A Corpus of English Philosophy Texts |
Moskowich, Isabel. 2016. Philosophers and Scientists from the Modern Age: Compiling the Corpus of English Philosophy Texts (CEPhiT). In Moskowich, Isabel et al. (eds.) ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins (1–23).
Moskowich, Isabel; Camiña, Gonzalo; Lareo, Inés; Crespo, Begoña (eds.) 2016. ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins. https://benjamins.com/#catalog/books/z.198/main. |
CETA |
A Corpus of English Texts on Astronomy |
Moskowich, Isabel. 2012. CETA as a Tool for the Study of Modern Astronomy in English. In Moskowich, Isabel & Crespo García, Begoña (eds.) Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins (35–56).
Moskowich, Isabel; Crespo, Begoña (eds.) 2012. Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins. http://benjamins.com/#catalog/books/z.173/main. |
CHELAR |
Corpus of Historical English Law Reports 1535-1999 |
Corpus of Historical English Law Reports 1535-1999 (CHELAR). Compiled by Paula Rodríguez-Puente, Teresa Fanego (Project Director), María José López-Couso, Belén Méndez-Naya, Paloma Núñez-Pertejo. University of Santiago de Compostela: Research Unit for Variation, Linguistic Change and Grammaticalization, Department of English and German. |
CHET |
A Corpus of History English Texts |
Crespo, Begoña and Moskowich, Isabel. 2015. A Corpus of History Texts (CHET) as part of the Coruña Corpus Project. In Proceedings of the international scientific conference Corpus linguistics – 2015. St Petersburgo: St Petersburgh State University. 14–23.
Moskowich, Isabel; Puente-Castelo, Luis; Crespo, Begoña and Monaco, Leida Maria. Forthcoming. “From his own diary we learn”: Investigating the Corpus of History English Texts. Amsterdam/Philadelphia: John Benjamins. |
CIE |
Corpus of Irish English 14th-20th c. |
Corpus of Irish English 14th-20th c. (CIE). Compiled by Raymond Hickey and contained in: Hickey, Raymond 2003. Corpus Presenter. Software for language analysis.. Amsterdam: John Benjamins, 292 pages with CD-ROM. |
CLEP |
Corpus of Late 18th c. Prose |
Corpus of Late 18th c. Prose (CLEP). The Corpus of late 18c Prose is available without fee for educational and research purposes, but it is not in the public domain. Copyright to the text is retained by the John Rylands University Library of Manchester; copyright to the annotated files is retained by David Denison and Linda van Bergen (© 2002). |
CLMETEV |
Corpus of Late Modern English Texts |
The Corpus of Late Modern English Texts (Extended Version). 2006. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven. |
CLOB |
A Brown family corpus of written British English. |
Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183. http://icame.uib.no/ij37/Pages_175-184.pdf |
CMEPV |
Corpus of Middle English Prose and Verse |
Corpus of Middle English Prose and Verse (CMEPV). Copyright institution : The Humanities Text Initiative, University of Michigan. |
CMSW |
Corpus of Modern Scottish Writing |
Corpus of Modern Scottish Writing (CMSW). Project leaders John Corbett and Jeremy Smith. © Corpus of Modern Scottish Writing, Glasgow University. |
CNNE |
Corpus of Nineteenth-century Newspaper English |
Corpus of Nineteenth-century Newspaper English (CNNE). Compiled by Erik Smitterberg (Uppsala University. |
COCA |
Corpus of Contemporary American English |
Davies, Mark. (2008–) The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. Available online at http://corpus.byu.edu/coca/ |
CoCELD |
Corpus of Contemporary English Legal Decisions, 1950–2021 |
Rodríguez-Puente, Paula and David Hernández-Coalla. 2022. Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD). Oviedo: University of Oviedo. |
CoER |
Corpus of Early English Recipes |
Corpus of Early English Recipes (CoER). Forthcoming. Compilers Francisco Alonso-Almeida, Ivalla Ortega-Barrera, Elena Quintana-Toledo. |
COERP |
Corpus of English Religious Prose |
Corpus of English Religious Prose (COERP). Compilers Thomas Kohnen, Tanja Rütten, Ingvilt Marcoe, Kirsten Gather, Dorothee Groeger, Anne Döring, Stefanie Leu. |
COHA |
Corpus of English Religious Prose |
Davies, Mark. (2010-) The Corpus of Historical American English: 400 million words, 1810-2009. |
COLMOBAENG |
Corpus of Late Modern British and American English Prose |
Corpus of Late Modern British and American English Prose (COLMOBAENG). Compiler Teresa Fanego. |
CoNE |
Corpus of Narrative Etymologies |
An appropriate citation is:
A Corpus of Narrative Etymologies compiled by Roger Lass, Margaret Laing, Rhona Alcorn and Keith Williamson [http://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html]. Edinburgh: Version 1.1, 2013-, ©The University of Edinburgh.
or:
Lass, Roger, Margaret Laing, Rhona Alcorn, Keith Williamson. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 1.1 [http://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html]. Edinburgh: © The University of Edinburgh. |
CONTE-pC |
Corpus of Early Ontario English, pre-Confederation Section |
Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Dollinger, Stefan (ed.) 2006. The Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Version 0.9. University of Vienna. |
CONTRAST-IT |
CONTRAST-IT Corpus |
Anna-Maria De Cesare (2011–2018). CONTRAST-IT. University of Basel, https://contrast-it.philhist.unibas.ch/en/home/ |
COOEE |
Corpus of Oz Early English |
Corpus of Oz Early English (COEE). The corpus has been compiled by the author as part of a doctoral thesis on the origins of Australian English:
Fritz, Clemens. 2007. From Early English in Australia to Australian English 1788-1900. Frankfurt: Peter Lang. |
CoSiB |
Corpus of Singaporean Blogs |
Corpus of Singaporean Blogs (CoSiB) 2006-2010. Copyright by Universität Trier, English Department, Andrea Sand. |
CROWN |
CROWN Corpus |
CROWN Corpus. A Brown family corpus of written American English.
Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183. http://icame.uib.no/ij37/Pages_175-184.pdf |
CSC |
Corpus of Scottish Correspondence |
Meurman-Solin, Anneli. 2007. CSC The Corpus of Scottish Correspondence, 1500-1715. |
DCPSE |
Diachronic Corpus of Present-Day Spoken English |
Diachronic Corpus of Present-Day Spoken English (DCPSE). Compiled by Professor Bas Aarts (Principal Investigator), Sean Wallis (Senior Research Fellow), Dr Dirk Bury, Lesley Kirk, Yordanka Kostadinova-Kavalova, Dr Ann Law and Gabriel Ozón. |
DECTE |
Diachronic Electronic Corpus of Tyneside English |
Corrigan, Karen P., Buchstaller, I., Mearns, A.J. and Moisl, H.L. (2012) The Diachronic Electronic Corpus of Tyneside English. Newcastle University. http://research.ncl.ac.uk/decte/index.htm |
DOEC |
Dictionary of Old English Corpus |
Dictionary of Old English Corpus (DOEC); original release (1981) compiled by Angus Cameron, Ashley Crandell Amos, Sharon Butler, and Antonette diPaolo Healey (Toronto: DOE Project 1981); 2009 release compiled by Antonette diPaolo Healey, Joan Holland, Ian McDougall, and David McDougall, with TEI-P5 conformant-version by Xin Xiang (Toronto: DOE Project 2009). |
ECEP |
Eighteenth-Century English Phonology Database |
ECEP = Eighteenth-Century English Phonology database, 2015/2023. Compiled by Joan C. Beal, Nuria Yáñez-Bouza, Ranjan Sen, and Christine Wallis. The University of Sheffield and Universidade de Vigo. Published by: University of Sheffield. https://www.dhi.ac.uk/projects/ecep/ |
ELFA |
English as a Lingua Franca in Academic Settings |
ELFA 2008. The Corpus of English as a Lingua Franca in Academic Settings. Director: Anna Mauranen. http://www.helsinki.fi/elfa/elfacorpus |
EMEMT |
Early Modern English Medical Texts |
Early Modern English Medical Texts (EMEMT). Taavitsainen Irma, Päivi Pahta, Martti Mäkinen, Turo Hiltunen, Ville Marttila, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.). CD-ROM. Amsterdam: John Benjamins. |
ENPC |
The English-Norwegian Parallel Corpus |
The English-Norwegian Parallel Corpus (1994-1997), Dept. of British and American Studies, University of Oslo. Compiled by Stig Johansson (project leader), Knut Hofland (project leader), Jarle Ebeling (research assistant), Signe Oksefjell (research assistant). http://www.hf.uio.no/ilos/english/services/omc/enpc/ |
FLOB |
The Freiburg-Lancaster-Oslo/Bergen Corpus |
The Freiburg-LOB Corpus (‘F-LOB’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg
The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster. |
FRED |
Freiburg Corpus of English Dialects |
for full version: Freiburg Corpus of English Dialects, English Dialects Research Group, Albert-Ludwigs-Universität Freiburg
for FRED-S: Freiburg Corpus of English Dialects (Sampler), English Dialects Research Group, Albert-Ludwigs-Universität Freiburg |
FROWN |
The Freiburg-Brown Corpus |
The Freiburg-Brown Corpus (‘Frown’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg
The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster |
GoogleBooks |
Google Books Corpora |
Davies, Mark. (2011-) Google Books (American English) Corpus (155 billion words, 1810-2009). Available online at http://googlebooks.byu.edu/.
Citation for Google Books and Culturomics:
Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331 (2011) [Published online ahead of print 12/16/2010]. |
HARES-CAM |
Helsinki Archive of Regional English Speech - Cambridgeshire sampler |
HARES-CAM = Helsinki Archive of Regional English Speech - Cambridgeshire sampler. 2010. Compiled by Ahava, Simo, Joseph McVeigh and Anna-Liisa Vasko at the Department of Modern Languages, University of Helsinki.
To refer to the corpus data, indicate which interview you are citing in parentheses after the excerpt (interviewID-hares). For example:
(1) then used to take the horses home and <pause/> clean them and feed them (cam13-hares). |
HC |
Helsinki Corpus |
The Helsinki Corpus of English Texts (1991). Department of Modern Languages, University of Helsinki. Compiled by Matti Rissanen (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English).
TEI XML edition:
Helsinki Corpus TEI XML Edition. 2011. First edition. Designed by Alpo Honkapohja, Samuli Kaislaniemi, Henri Kauhanen, Matti Kilpiö, Ville Marttila, Terttu Nevalainen, Arja Nurmi, Matti Rissanen and Jukka Tyrkkö. Implemented by Henri Kauhanen and Ville Marttila. Based on The Helsinki Corpus of English Texts (1991). Helsinki: The Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki. |
HCOS |
Helsinki Corpus of Older Scots |
The Helsinki Corpus of Older Scots (1995). Department of English, University of Helsinki. Compiled by Anneli Meurman-Solin. |
HD |
Helsinki Corpus of British English Dialects |
The Helsinki Corpus of British English Dialects (2006). Department of Modern Languages, University of Helsinki. All the material consists of interviews made by the fieldworkers mentioned below who have full copyright for the material. For permission to use the files, contact Anna-Liisa Vasko (anna-liisa.vasko@helsinki.fi) or Kirsti Peitsara (kirsti.peitsara@helsinki.fi). |
ICE |
International Corpus of English |
International Corpus of English (ICE). The ICE-project is internationally coordinated by by Dr Gerald Nelson at the Chinese University of Hong Kong.
Reference lines and copyrights according to each corpus. Information found in manual. |
ICE-GB |
International Corpus of English - Great Britain |
International Corpus of English - the British Component (ICE-GB). Coordinated by the Survey of English Usage. |
ICE-GBR |
International Corpus of English - Gibraltar |
Seoane, Elena, Lucía Loureiro-Porto & Cristina Suárez-Gómez. (to appear) "The ICE project looks at Iberia: The International Corpus of Gibraltar English". |
ICE-NIG |
International Corpus of English - Nigeria |
Wunder, Eva-Maria, Holger Voormann, and Ulrike Gut. "The ICE Nigeria corpus project: Creating an open, rich and accurate corpus." ICAME Journal 34 (2010): 78-88. |
ICE-SCO |
International Corpus of English - Scotland |
Schützler, Ole, Ulrike Gut and Robert Fuchs (to appear). New perspectives on Scottish Standard English: Introducing the Scottish component of the International Corpus of English. In Beal, Joan and Sylvie Hancil (eds.). Northern British English (working title). Berlin: de Gruyter. |
ICoMEP |
Innsbruck Corpus of Middle English Prose |
Innsbruck Corpus of Middle English Prose (ICoMEP). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Manfred Markus, Universität Innsbruck. |
JSCC |
The John Swales Conference Corpus |
John Swales Conference Corpus (2009). Ann Arbor, MI: The Regents of the University of Michigan. |
LAEME |
A Linguistic Atlas of Early Middle English |
An appropriate citation is:
A Linguistic Atlas of Early Middle English, 1150-1325, compiled by Margaret Laing [http://www.lel.ed.ac.uk/ihd/laeme2/laeme2.html]. Edinburgh: Version 3.2, 2013, © The University of Edinburgh.
or:
Laing, M. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 3.2 [http://www.lel.ed.ac.uk/ihd/laeme2/laeme2.html]. Edinburgh: © The University of Edinburgh.
Citation for the 2008 edition - A Linguistic Atlas of Early Middle English, 1150-1325 [http://www.lel.ed.ac.uk/ihd/laeme1/laeme1.html] compiled by Margaret Laing and Roger Lass (Edinburgh: © 2007- The University of Edinburgh). |
eLALME |
A Linguistic Atlas of Late Mediaeval English |
M. Benskin, M. Laing, V. Karaiskos and K. Williamson. An Electronic Version of A Linguistic Atlas of Late Mediaeval English [http://www.lel.ed.ac.uk/ihd/elalme/elalme.html] (Edinburgh: © 2013- The Authors and The University of Edinburgh). |
LAMSAS |
A Linguistic Atlas of the Middle and South Atlantic States |
A Linguistic Atlas of the Middle and South Atlantic States. Project leaders originally Hans Kurath, later Raven I. MacDavid Jr., William A. Kretzscmar Jr.
Users should indicate in any subsequent publication using LAP data/materials that they have obtained the data/materials from the LAP Web site, and indicate that reproduction, copying, distribution, display, etc., as the case may be, of any LAP data/ materials is governed by the "cost of reproduction" condition. |
LC |
The Lampeter Corpus of Early Modern English Tracts |
The Lampeter Corpus of Early Modern English Tracts. 1999. Compiled by Josef Schmied, Claudia Claridge, and Rainer Siemund. (In: ICAME Collection of English Language Corpora (CD-ROM), Second Edition, eds. Knut Hofland, Anne Lindebjerg, Jørn Thunestvedt, The HIT Centre, University of Bergen, Norway.) |
LCoICAMET |
The Letter Corpus of ICAMET |
The Letter Corpus of ICAMET (LCoICAMET). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Markus Manfred, Universität Innsbruck. |
LEME |
Lexicons of Early Middle English |
Lexicons of Early Modern English. Ed. Ian Lancashire. Toronto, ON: University of Toronto Library and University of Toronto Press, 2014. Date consulted: [date month year]. URL: leme.library.utoronto.ca |
LLC |
London-Lund Corpus of Spoken English |
London-Lund Corpus of Spoken English. Project leader Jan Svartvik, Lund University. |
LMEMT |
Late Modern English Medical Texts |
Late Modern English Medical Texts (LMEMT). Taavitsainen Irma, Päivi Pahta, Turo Hiltunen, Ville Marttila, Raisa Oinonen, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.). |
LModE |
Corpus of Late Modern English Prose |
Corpus of Late Modern English Prose. Compiled by David Denison (project leader), Linda van Berger, Graeme Trousdale. |
LOB |
The Lancaster-Oslo/Bergen Corpus |
The LOB Corpus, original version (1970-1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), and Knut Hofland, University of Bergen (head of computing).
The LOB Corpus, POS-tagged version (1981-1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing). |
MCEModESP |
The Málaga Corpus of Early Modern English Scientific Prose |
Calle-Martín, Javier et al. 2016. The Málaga Corpus of Early Modern English Scientific Prose (MCEModESP). Málaga: University of Málaga. Available from: https://modernmss.uma.es |
MCLMESP |
The Málaga Corpus of Late Middle English Scientific Prose |
Miranda-García, Antonio et al. 2015. The Málaga Corpus of Late Middle English Scientific Prose (MCLMESP). Málaga: University of Málaga. Available from: https://hunter.uma.es |
MEG-C |
Middle English Grammar project |
MEG-C Base, version 2009.1", The Middle English Grammar Corpus, Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith (compilers), December 2009, University of Stavanger, accessed [date], http://www.uis.no/research/culture/the_middle_english_grammar_project/meg-c_base/. |
MEMT |
Middle English Medical Texts |
2005. Middle English Medical Texts. Taavitsainen Irma, Päivi Pahta and Martti Mäkinen (eds.). CD-ROM. Amsterdam: John Benjamins. |
MICASE |
Michigan Corpus of Academic Spoken English |
Simpson, R. C., S. L. Briggs, J. Ovens, and J. M. Swales. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan. |
MICUSP |
Michigan Corpus of Upper-level Student Papers |
Michigan Corpus of Upper-level Student Papers. (2009). Ann Arbor, MI: The Regents of the University of Michigan. |
MOECS |
Corpus of Multilingual Opinion Essays by College Students |
Okugiri, M., Ijuin, I., Komori, K. 2015. The Corpus of Multilingual Opinion Essays by College Students. RETRIEVED from http://www.u-sacred-heart.ac.jp/okugiri/links/moecs/moecs.html |
NECTE |
Newcastle Electronic Corpus of Tyneside English |
The Newcastle Electronic Corpus of Tyneside English. |
OBC |
Old Bailey Corpus |
Huber, Magnus; Nissel, Magnus; Maiwald, Patrick; Widlitzki, Bianca. 2012. The Old Bailey Corpus. Spoken English in the 18th and 19th centuries. www1.uni-giessen.de/oldbaileycorpus, [date of access].
Users who wish to cite material from the Old Bailey Corpus Online Website in publications should provide the URL (www1.uni-giessen.de/oldbaileycorpus) and the date on which the website was consulted. To cite concordance material obtained by searching a corpus in the Old Bailey Corpus suite (online or offline), include the version of the corpus used as well as the trial ID provided in the online concordances or in the <speech> or <trial> tags. For more information, see the Citation guide |
PCEEC |
Parsed Corpus of Early English Correspondence |
Parsed Corpus of Early English Correspondence, parsed version. 2006. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University ofYork and Helsinki: University of Helsinki. Distributed through the OxfordText Archive.
Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.
Parsed Corpus of Early English Correspondence, text version. 2006. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin, with additional annotation by Ann Taylor. Helsinki: University of Helsinki and York: University of York. Distributed through the Oxford Text Archive. |
PCEEC2 |
Parsed Corpus of Early English Correspondence, 2nd edition |
Parsed Corpus of Early English Correspondence 2, parsed version. 2022. Revised and corrected by Beatrice Santorini. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. https://github.com/beatrice57/pceec2 |
PPCEME |
The Penn-Helsinki Parsed Corpus of Early Modern English |
Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-3/index.html |
PPCMBE |
The Penn Parsed Corpus of Modern British English |
Kroch, Anthony, Beatrice Santorini and Ariel Diertani. 2010. Penn Parsed Corpus of Modern British English. http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html |
PPCME2 |
The Penn-Helsinki Parsed Corpus of Middle English, second edition |
Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition. http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-4/index.html |
PWEC |
Pakistan Written English Corpus |
Khan, U. (2023) Pakistan Written English Corpus. |
QHC |
Quaker Historical Corpus |
Quaker Historical Corpus (QHC). 2015. Compiled by Judith Roads at the University of Birmingham. http://www.woodbrooke.org.uk/pages/quaker-historical-corpus.html |
RCN1 |
Rostock Newspaper Corpus |
Compiled by Friedrich Ungerer, Kristina Schneider, Birte Bös at Rosctock University. |
SC |
Salamanca Corpus. Digital Archive of English Dialect Texts |
Copyright © 2011-DING, The Salamanca Corpus, Universidad de Salamanca |
SCEPA |
Small Corpus of English Political Apologies |
Small Corpus of English Political Apologies (SCEPA). 2017. Compiled by Halyna Liubinska. Lviv Polytechnic National University. |
SCoCESLE |
Small Corpus of Colombian English as a Second Language Essays
|
Velasco, E. (2023). Small Corpus of Colombian English as a Second Language Essays (SCoCESLE), Mendeley Data, V1, doi:10.17632/wfcbfy29wm.1 |
SCONE |
Seville Corpus of Northern English |
Seville Corpus of Northern English (SCONE). Project leader Dra. Julia Fernández Cuesta, Departamento de Lengua Inglesa, Universidad de Sevilla. |
SCOTS |
Scottish Corpus of Texts & Speech |
Scottish Corpus of Texts & Speech (2007). Department of English Language, University of Glasgow, Scotland, UK. |
SCPS |
Small Corpus of Political Speeches |
Small Corpus of Political Speeches (SCPS). Compiled under the supervision of Jukka Tyrkkö (University of Helsinki). |
TaCoCASE |
Transatlantic Component of the Corpus of Academic Spoken English |
TaCoCASE. 2023. Transatlantic Component of the CASE project. Birkenfeld: Trier University of Applied Sciences. Version 1.0. Collet, Caroline. [http://umwelt-campus.de/case/tacocase] |
TCEECE |
Tagged Corpus of Early English Correspondence Extension |
Tagged Corpus of Early English Correspondence Extension. 2020. Annotated by Lassi Saario & Tanja Säily. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. |
TCEECES |
Tagged Corpus of Early English Correspondence Extension Sampler |
Tagged Corpus of Early English Correspondence Extension Sampler. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Annotated by Lassi Saario & Tanja Säily. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.5887230 |
TIME |
Time Corpus |
Davies, Mark. (2007–) TIME Magazine Corpus (100 million words, 1920s-2000s). Available online at http://corpus.byu.edu/time. |
ViMELF |
Corpus of Video-Mediated English as a Lingua Franca Conversations |
ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. Version 1.0. The CASE project [umwelt-campus.de/case]. |
VOICE |
Vienna-Oxford International Corpus of English |
The recommended citation for VOICE 1.0 Online is:
VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). Director: Barbara Seidlhofer; Researchers: Angelika Breiteneder, Theresa Klimpfinger, Stefan Majewski, Marie-Luise Pitzl. http://voice.univie.ac.at (date of last access).
The short citation for VOICE 1.0 Online is:
VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). http://voice.univie.ac.at (date of last access). |
WestLabUSENET |
Reduced redundancy USENET corpus |
Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET corpus (2005-2011) Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html) |
WomenScientists |
A Corpus of Women Scientists |
The Corpus of Women Scientists is a subcorpus of The Coruña Corpus of English Scientific Writing (CC). Forthcoming. |
YCCQA |
Yahoo-based Contrastive Corpus of Questions and Answers |
Contrastive Corpus of Questions and Answers. 2009. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven. |
YCOE |
York-Toronto-Helsinki Parsed Corpus of Old English Prose |
Taylor, A., A. Warner, S. Pintzuk, and F. Beths. (2003). The York-Toronto-Helsinki Parsed Corpus of Old English Prose. Electronic texts and manuals available from the Oxford Text Archive. |
ZEN |
Zurich English Newspaper corpus |
ZEN, Zurich English Newspaper Corpus Version 1.0. English Department of the University of Zurich. http://es-zen.unizh.ch. |