Reference lines

The lack of a universally established reference line is a problem that plagues most corpora. A corpus is (usually) not a publication, and it can be very difficult to know what the official name of the corpus is, whose names ought to be included, what year (if any) should be mentioned, and what organizations or institutions are or were involved in the project.

Through CoRD, we hope to alleviate the problem by giving corpus compilers a place in which to let colleagues know what they, the compilers, think the reference lines ought to be. The reference lines are to be found in the description of each individual corpus. For quick reference and comparison, we also collect them here. The colourful reality of reference lines is clearly apparent.

Corpus		Reference line
ALEC	Advanced Learner English Corpus	Advanced Learner English Corpus (ALEC). Compiled by Tove Larsson. Uppsala University.
APU	APU Writing and Reading Corpus 1979-1988	APU Writing and Reading Corpus 1979-1988 (APU). Compiled by Nuria Yáñez-Bouza (University of Vigo, Spain) and Victorina González-Díaz (University of Liverpool, UK).
ARCHER	A Representative Corpus of Historical English Registers	Publications making use of ARCHER shall include a reference to the name of the corpus, the years of compilation, and the compiler team. A suitable bibliographic listing is as follows (with ‘x’ replaced as appropriate): ARCHER-X = A Representative Corpus of Historical English Registers version X. 1990–1993/2002/2007/2010/2013/2016. Originally compiled under the supervision of Douglas Biber and Edward Finegan at Northern Arizona University and University of Southern California; modified and expanded by subsequent members of a consortium of universities. Current member universities are Bamberg, Freiburg, Heidelberg, Helsinki, Lancaster, Leicester, Manchester, Michigan, Northern Arizona, Santiago de Compostela, Southern California, Trier, Uppsala, Zurich. Examples of usage taken from ARCHER were obtained under the terms of the ARCHER User Agreement. We recommend that individual citations from ARCHER should include the text identifier (filename), e.g. “1722grah_s3b”. The ARCHER version used should be acknowledged with the citation or globally in the bibliography. For examples retrieved at the consortium departments this will be ARCHER 3.2 / 3.1 / 2 / 1, as appropriate; for examples retrieved from the online versions it will be ARCHER 3.2 (Lancaster) or ARCHER 3.2 (Zurich), as appropriate.
BASE	British Academic Spoken English Corpus	British Academic Spoken English Corpus (BASE). The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson. When referring to the BASE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Thompson, P. and Nesi, H. (2001) The British Academic Spoken English (BASE) Corpus Project. Language Teaching Research 5 (3) 263-264
BAWE	British Academic Written English Corpus	British Academic Written English Corpus (BAWE). Use of the corpus is acknowledged using the following form of words: The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800). When referring to the BAWE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Gardner, S. & Nesi, H. (2013) A classification of genre families in university student writing Applied Linguistics 34 (1) 1-29 or Nesi, H. & Gardner, S (2012) Genres across the Disciplines: Student writing in higher education. Cambridge University Press.
B-BROWN	The B-Brown-1931 Corpus	The B-Brown-1931 Corpus (B-BROWN). Project leader Marianne Hundt.
BE06	British English 2006	The British English 2006 corpus (BE06). Compiled by Paul Baker.
BLOB-1931	The BLOB-1931 Corpus	The BLOB-1931 Corpus (BLOB-1931). Project leaders: 1: Geoffrey Leech (University of Lancaster), 2: Paul Rayson (Lancaster University).
BNC	British National Corpus	British National Corpus (BNC). A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.
BROWN	The Brown corpus	A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.
Buckeye	Buckeye Corpus	Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [www.buckeyecorpus.osu.edu] Columbus, OH: Department of Psychology, Ohio State University (Distributor).
CASE	Corpus of Academic Spoken English	Corpus of Academic Spoken English (CASE). Long citation: CASE. Forthcoming. Corpus of Academic Spoken English. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Saarbrücken: Saarland University (coordination) / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain. [http://www.uni-saarland.de/index.php?id=48492] (date of last access). Short citation: CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. [http://www.uni-saarland.de/index.php?id=48492] (date of last access). Single transcript citation: 05HE18FL52. CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. (please also cite CASE)
CC	Coruña Corpus of English Scientific Writing	Coruña Corpus of English Scientific Writing (CC). Compiled by MUSTE Research Group. Project leader Isabel Moskowich. Parapar López, Javier & Moskowich, Isabel. 2007. The Coruña Corpus Tool. Revista del Procesamiento de Lenguaje Natural, 39: 289–290. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/2690 Moskowich, Isabel & Crespo García, Begoña. 2007. Presenting the Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing. In Pérez Guerra, Javier et al. (eds.) ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English. Bern: Peter Lang (341–357). Moskowich-Spiegel Fandiño, Isabel & Parapar López, Javier. 2008. Writing Science, Compiling Science. The Coruña Corpus of English Scientific Writing. In Lorenzo Modia, María Jesús (ed.) Proceedings from the 31st AEDEAN Conference. A Coruña: Universidade da Coruña (531–544). Crespo García, Begoña & Isabel Moskowich. 2010. CETA in the Context of the Coruña Corpus. Literary and Linguistic Computing, 25(2): 153–164. doi:10.1093/llc/fqp038
CED	A Corpus of English Dialogues 1560-1760	A Corpus of English Dialogues 1560-1760. 2006. Compiled under the supervision of Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University).
CEEC	Corpus of Early English Correspondence	Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin at the Department of Modern Languages, University of Helsinki.
CEECE	Corpus of Early English Correspondence Extension	Corpus of Early English Correspondence Extension. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio at the Department of Modern Languages, University of Helsinki.
CEECES 1	Corpus of Early English Correspondence Extension Sampler, part 1	Corpus of Early English Correspondence Extension Sampler, part 1. 2021. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.4644243
CEECES 2	Corpus of Early English Correspondence Extension Sampler, part 2	Corpus of Early English Correspondence Extension Sampler, part 2. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.5887100
CEECS	Corpus of Early English Correspondence Sampler	Corpus of Early English Correspondence Sampler. 1998. Compiled by Jukka Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin and Helena Raumolin-Brunberg at the Department of English, University of Helsinki.
CEECSU	Corpus of Early English Correspondence Supplement	Corpus of Early English Correspondence Supplement. Compiled by Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin, Helena Raumolin-Brunberg and Anni Sairio at the Department of English, University of Helsinki.
CEEM	Corpus of Early English Medical Writing	See individual subcorpora
CELiST	A Corpus of English Life Sciences Texts	A Corpus of English Life Sciences Texts (CELiST) is a sub-corpus of the Coruña Corpus of English Scientific Writing (CC). Forthcoming. Compiled by the MUSTE Research Group.
CEPhiT	A Corpus of English Philosophy Texts	Moskowich, Isabel. 2016. Philosophers and Scientists from the Modern Age: Compiling the Corpus of English Philosophy Texts (CEPhiT). In Moskowich, Isabel et al. (eds.) ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins (1–23). Moskowich, Isabel; Camiña, Gonzalo; Lareo, Inés; Crespo, Begoña (eds.) 2016. ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins. https://benjamins.com/#catalog/books/z.198/main.
CETA	A Corpus of English Texts on Astronomy	Moskowich, Isabel. 2012. CETA as a Tool for the Study of Modern Astronomy in English. In Moskowich, Isabel & Crespo García, Begoña (eds.) Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins (35–56). Moskowich, Isabel; Crespo, Begoña (eds.) 2012. Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins. http://benjamins.com/#catalog/books/z.173/main.
CHELAR	Corpus of Historical English Law Reports 1535-1999	Corpus of Historical English Law Reports 1535-1999 (CHELAR). Compiled by Paula Rodríguez-Puente, Teresa Fanego (Project Director), María José López-Couso, Belén Méndez-Naya, Paloma Núñez-Pertejo. University of Santiago de Compostela: Research Unit for Variation, Linguistic Change and Grammaticalization, Department of English and German.
CHET	A Corpus of History English Texts	Crespo, Begoña and Moskowich, Isabel. 2015. A Corpus of History Texts (CHET) as part of the Coruña Corpus Project. In Proceedings of the international scientific conference Corpus linguistics – 2015. St Petersburgo: St Petersburgh State University. 14–23. Moskowich, Isabel; Puente-Castelo, Luis; Crespo, Begoña and Monaco, Leida Maria. Forthcoming. “From his own diary we learn”: Investigating the Corpus of History English Texts. Amsterdam/Philadelphia: John Benjamins.
CIE	Corpus of Irish English 14th-20th c.	Corpus of Irish English 14th-20th c. (CIE). Compiled by Raymond Hickey and contained in: Hickey, Raymond 2003. Corpus Presenter. Software for language analysis.. Amsterdam: John Benjamins, 292 pages with CD-ROM.
CLEP	Corpus of Late 18th c. Prose	Corpus of Late 18th c. Prose (CLEP). The Corpus of late 18c Prose is available without fee for educational and research purposes, but it is not in the public domain. Copyright to the text is retained by the John Rylands University Library of Manchester; copyright to the annotated files is retained by David Denison and Linda van Bergen (© 2002).
CLMETEV	Corpus of Late Modern English Texts	The Corpus of Late Modern English Texts (Extended Version). 2006. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven.
CLOB	A Brown family corpus of written British English.	Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183. http://icame.uib.no/ij37/Pages_175-184.pdf
CMEPV	Corpus of Middle English Prose and Verse	Corpus of Middle English Prose and Verse (CMEPV). Copyright institution : The Humanities Text Initiative, University of Michigan.
CMSW	Corpus of Modern Scottish Writing	Corpus of Modern Scottish Writing (CMSW). Project leaders John Corbett and Jeremy Smith. © Corpus of Modern Scottish Writing, Glasgow University.
CNNE	Corpus of Nineteenth-century Newspaper English	Corpus of Nineteenth-century Newspaper English (CNNE). Compiled by Erik Smitterberg (Uppsala University.
COCA	Corpus of Contemporary American English	Davies, Mark. (2008–) The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. Available online at http://corpus.byu.edu/coca/
CoCELD	Corpus of Contemporary English Legal Decisions, 1950–2021	Rodríguez-Puente, Paula and David Hernández-Coalla. 2022. Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD). Oviedo: University of Oviedo.
CoER	Corpus of Early English Recipes	Corpus of Early English Recipes (CoER). Forthcoming. Compilers Francisco Alonso-Almeida, Ivalla Ortega-Barrera, Elena Quintana-Toledo.
COERP	Corpus of English Religious Prose	Corpus of English Religious Prose (COERP). Compilers Thomas Kohnen, Tanja Rütten, Ingvilt Marcoe, Kirsten Gather, Dorothee Groeger, Anne Döring, Stefanie Leu.
COHA	Corpus of English Religious Prose	Davies, Mark. (2010-) The Corpus of Historical American English: 400 million words, 1810-2009.
COLMOBAENG	Corpus of Late Modern British and American English Prose	Corpus of Late Modern British and American English Prose (COLMOBAENG). Compiler Teresa Fanego.
CoNE	Corpus of Narrative Etymologies	An appropriate citation is: A Corpus of Narrative Etymologies compiled by Roger Lass, Margaret Laing, Rhona Alcorn and Keith Williamson [http://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html]. Edinburgh: Version 1.1, 2013-, ©The University of Edinburgh. or: Lass, Roger, Margaret Laing, Rhona Alcorn, Keith Williamson. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 1.1 [http://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html]. Edinburgh: © The University of Edinburgh.
CONTE-pC	Corpus of Early Ontario English, pre-Confederation Section	Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Dollinger, Stefan (ed.) 2006. The Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Version 0.9. University of Vienna.
CONTRAST-IT	CONTRAST-IT Corpus	Anna-Maria De Cesare (2011–2018). CONTRAST-IT. University of Basel, https://contrast-it.philhist.unibas.ch/en/home/
COOEE	Corpus of Oz Early English	Corpus of Oz Early English (COEE). The corpus has been compiled by the author as part of a doctoral thesis on the origins of Australian English: Fritz, Clemens. 2007. From Early English in Australia to Australian English 1788-1900. Frankfurt: Peter Lang.
CoSiB	Corpus of Singaporean Blogs	Corpus of Singaporean Blogs (CoSiB) 2006-2010. Copyright by Universität Trier, English Department, Andrea Sand.
CoWITE	Corpus of Women’s Instructive Texts in English	CoWITE18 = Alonso Almeida, F., Álvarez-Gil, F. J., & Ortega-Barrera, I. (2025). Corpus of Women’s Instructive Texts in English (1700–1799) [Data set]. DiCoS-LA: https://dicos-la.com, Zenodo: https://doi.org/10.5281/zenodo.15151249 CoWITE19 = Alonso-Almeida, F., Álvarez-Gil, F. J., Ortega Barrera, I., Quintana-Toledo, E., De la Cruz-Cabanillas, I., Bator, M., Sánchez Cuervo, M. E., & Gómez-Calderón, M. J. (2025). Corpus of Women’s Instructive Texts in English (1800–1899) [Data set]. DiCoS-LA: https://dicos-la.com, Zenodo: https://doi.org/10.5281/zenodo.15097949
CROWN	CROWN Corpus	CROWN Corpus. A Brown family corpus of written American English. Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183. http://icame.uib.no/ij37/Pages_175-184.pdf
CSC	Corpus of Scottish Correspondence	Meurman-Solin, Anneli. 2007. CSC The Corpus of Scottish Correspondence, 1500-1715.
DCPSE	Diachronic Corpus of Present-Day Spoken English	Diachronic Corpus of Present-Day Spoken English (DCPSE). Compiled by Professor Bas Aarts (Principal Investigator), Sean Wallis (Senior Research Fellow), Dr Dirk Bury, Lesley Kirk, Yordanka Kostadinova-Kavalova, Dr Ann Law and Gabriel Ozón.
DECTE	Diachronic Electronic Corpus of Tyneside English	Corrigan, Karen P., Buchstaller, I., Mearns, A.J. and Moisl, H.L. (2012) The Diachronic Electronic Corpus of Tyneside English. Newcastle University. http://research.ncl.ac.uk/decte/index.htm
DOEC	Dictionary of Old English Corpus	Dictionary of Old English Corpus (DOEC); original release (1981) compiled by Angus Cameron, Ashley Crandell Amos, Sharon Butler, and Antonette diPaolo Healey (Toronto: DOE Project 1981); 2009 release compiled by Antonette diPaolo Healey, Joan Holland, Ian McDougall, and David McDougall, with TEI-P5 conformant-version by Xin Xiang (Toronto: DOE Project 2009).
ECEP	Eighteenth-Century English Phonology Database	ECEP = Eighteenth-Century English Phonology database, 2015/2023. Compiled by Joan C. Beal, Nuria Yáñez-Bouza, Ranjan Sen, and Christine Wallis. The University of Sheffield and Universidade de Vigo. Published by: University of Sheffield. https://www.dhi.ac.uk/projects/ecep/
ELFA	English as a Lingua Franca in Academic Settings	ELFA 2008. The Corpus of English as a Lingua Franca in Academic Settings. Director: Anna Mauranen. http://www.helsinki.fi/elfa/elfacorpus
EMEMT	Early Modern English Medical Texts	Early Modern English Medical Texts (EMEMT). Taavitsainen Irma, Päivi Pahta, Martti Mäkinen, Turo Hiltunen, Ville Marttila, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.). CD-ROM. Amsterdam: John Benjamins.
ENPC	The English-Norwegian Parallel Corpus	The English-Norwegian Parallel Corpus (1994-1997), Dept. of British and American Studies, University of Oslo. Compiled by Stig Johansson (project leader), Knut Hofland (project leader), Jarle Ebeling (research assistant), Signe Oksefjell (research assistant). http://www.hf.uio.no/ilos/english/services/omc/enpc/
FLOB	The Freiburg-Lancaster-Oslo/Bergen Corpus	The Freiburg-LOB Corpus (‘F-LOB’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster.
FRED	Freiburg Corpus of English Dialects	for full version: Freiburg Corpus of English Dialects, English Dialects Research Group, Albert-Ludwigs-Universität Freiburg for FRED-S: Freiburg Corpus of English Dialects (Sampler), English Dialects Research Group, Albert-Ludwigs-Universität Freiburg
FROWN	The Freiburg-Brown Corpus	The Freiburg-Brown Corpus (‘Frown’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster
GoogleBooks	Google Books Corpora	Davies, Mark. (2011-) Google Books (American English) Corpus (155 billion words, 1810-2009). Available online at http://googlebooks.byu.edu/. Citation for Google Books and Culturomics: Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331 (2011) [Published online ahead of print 12/16/2010].
HARES-CAM	Helsinki Archive of Regional English Speech - Cambridgeshire sampler	HARES-CAM = Helsinki Archive of Regional English Speech - Cambridgeshire sampler. 2010. Compiled by Ahava, Simo, Joseph McVeigh and Anna-Liisa Vasko at the Department of Modern Languages, University of Helsinki. To refer to the corpus data, indicate which interview you are citing in parentheses after the excerpt (interviewID-hares). For example: (1) then used to take the horses home and <pause/> clean them and feed them (cam13-hares).
HC	Helsinki Corpus	The Helsinki Corpus of English Texts (1991). Department of Modern Languages, University of Helsinki. Compiled by Matti Rissanen (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English). TEI XML edition: Helsinki Corpus TEI XML Edition. 2011. First edition. Designed by Alpo Honkapohja, Samuli Kaislaniemi, Henri Kauhanen, Matti Kilpiö, Ville Marttila, Terttu Nevalainen, Arja Nurmi, Matti Rissanen and Jukka Tyrkkö. Implemented by Henri Kauhanen and Ville Marttila. Based on The Helsinki Corpus of English Texts (1991). Helsinki: The Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki.
HCOS	Helsinki Corpus of Older Scots	The Helsinki Corpus of Older Scots (1995). Department of English, University of Helsinki. Compiled by Anneli Meurman-Solin.
HD	Helsinki Corpus of British English Dialects	The Helsinki Corpus of British English Dialects (2006). Department of Modern Languages, University of Helsinki. All the material consists of interviews made by the fieldworkers mentioned below who have full copyright for the material. For permission to use the files, contact Anna-Liisa Vasko (anna-liisa.vasko@helsinki.fi) or Kirsti Peitsara (kirsti.peitsara@helsinki.fi).
ICE	International Corpus of English	International Corpus of English (ICE). The ICE-project is internationally coordinated by by Dr Gerald Nelson at the Chinese University of Hong Kong. Reference lines and copyrights according to each corpus. Information found in manual.
ICE-GB	International Corpus of English - Great Britain	International Corpus of English - the British Component (ICE-GB). Coordinated by the Survey of English Usage.
ICE-GBR	International Corpus of English - Gibraltar	Seoane, Elena, Lucía Loureiro-Porto & Cristina Suárez-Gómez. (to appear) "The ICE project looks at Iberia: The International Corpus of Gibraltar English".
ICE-NIG	International Corpus of English - Nigeria	Wunder, Eva-Maria, Holger Voormann, and Ulrike Gut. "The ICE Nigeria corpus project: Creating an open, rich and accurate corpus." ICAME Journal 34 (2010): 78-88.
ICE-SCO	International Corpus of English - Scotland	Schützler, Ole, Ulrike Gut and Robert Fuchs (to appear). New perspectives on Scottish Standard English: Introducing the Scottish component of the International Corpus of English. In Beal, Joan and Sylvie Hancil (eds.). Northern British English (working title). Berlin: de Gruyter.
ICoMEP	Innsbruck Corpus of Middle English Prose	Innsbruck Corpus of Middle English Prose (ICoMEP). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Manfred Markus, Universität Innsbruck.
JSCC	The John Swales Conference Corpus	John Swales Conference Corpus (2009). Ann Arbor, MI: The Regents of the University of Michigan.
LAEME	A Linguistic Atlas of Early Middle English	An appropriate citation is: A Linguistic Atlas of Early Middle English, 1150-1325, compiled by Margaret Laing [http://www.lel.ed.ac.uk/ihd/laeme2/laeme2.html]. Edinburgh: Version 3.2, 2013, © The University of Edinburgh. or: Laing, M. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 3.2 [http://www.lel.ed.ac.uk/ihd/laeme2/laeme2.html]. Edinburgh: © The University of Edinburgh. Citation for the 2008 edition - A Linguistic Atlas of Early Middle English, 1150-1325 [http://www.lel.ed.ac.uk/ihd/laeme1/laeme1.html] compiled by Margaret Laing and Roger Lass (Edinburgh: © 2007- The University of Edinburgh).
eLALME	A Linguistic Atlas of Late Mediaeval English	M. Benskin, M. Laing, V. Karaiskos and K. Williamson. An Electronic Version of A Linguistic Atlas of Late Mediaeval English [http://www.lel.ed.ac.uk/ihd/elalme/elalme.html] (Edinburgh: © 2013- The Authors and The University of Edinburgh).
LAMSAS	A Linguistic Atlas of the Middle and South Atlantic States	A Linguistic Atlas of the Middle and South Atlantic States. Project leaders originally Hans Kurath, later Raven I. MacDavid Jr., William A. Kretzscmar Jr. Users should indicate in any subsequent publication using LAP data/materials that they have obtained the data/materials from the LAP Web site, and indicate that reproduction, copying, distribution, display, etc., as the case may be, of any LAP data/ materials is governed by the "cost of reproduction" condition.
LC	The Lampeter Corpus of Early Modern English Tracts	The Lampeter Corpus of Early Modern English Tracts. 1999. Compiled by Josef Schmied, Claudia Claridge, and Rainer Siemund. (In: ICAME Collection of English Language Corpora (CD-ROM), Second Edition, eds. Knut Hofland, Anne Lindebjerg, Jørn Thunestvedt, The HIT Centre, University of Bergen, Norway.)
LCoICAMET	The Letter Corpus of ICAMET	The Letter Corpus of ICAMET (LCoICAMET). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Markus Manfred, Universität Innsbruck.
LEME	Lexicons of Early Middle English	Lexicons of Early Modern English. Ed. Ian Lancashire. Toronto, ON: University of Toronto Library and University of Toronto Press, 2014. Date consulted: [date month year]. URL: leme.library.utoronto.ca
LLC	London-Lund Corpus of Spoken English	London-Lund Corpus of Spoken English. Project leader Jan Svartvik, Lund University.
LMEMT	Late Modern English Medical Texts	Late Modern English Medical Texts (LMEMT). Taavitsainen Irma, Päivi Pahta, Turo Hiltunen, Ville Marttila, Raisa Oinonen, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.).
LModE	Corpus of Late Modern English Prose	Corpus of Late Modern English Prose. Compiled by David Denison (project leader), Linda van Berger, Graeme Trousdale.
LOB	The Lancaster-Oslo/Bergen Corpus	The LOB Corpus, original version (1970-1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), and Knut Hofland, University of Bergen (head of computing). The LOB Corpus, POS-tagged version (1981-1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).
MCEModESP	The Málaga Corpus of Early Modern English Scientific Prose	Calle-Martín, Javier et al. 2016. The Málaga Corpus of Early Modern English Scientific Prose (MCEModESP). Málaga: University of Málaga. Available from: https://modernmss.uma.es
MCLMESP	The Málaga Corpus of Late Middle English Scientific Prose	Miranda-García, Antonio et al. 2015. The Málaga Corpus of Late Middle English Scientific Prose (MCLMESP). Málaga: University of Málaga. Available from: https://hunter.uma.es
MEG-C	Middle English Grammar project	MEG-C Base, version 2009.1", The Middle English Grammar Corpus, Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith (compilers), December 2009, University of Stavanger, accessed [date], http://www.uis.no/research/culture/the_middle_english_grammar_project/meg-c_base/.
MEMT	Middle English Medical Texts	2005. Middle English Medical Texts. Taavitsainen Irma, Päivi Pahta and Martti Mäkinen (eds.). CD-ROM. Amsterdam: John Benjamins.
MetaLing	MetaLing Corpus Project	The MetaLing Corpus (2025). Compiled at the Universities of Milan and Insubria by Angela Andreani and Daniel Russo.
MICASE	Michigan Corpus of Academic Spoken English	Simpson, R. C., S. L. Briggs, J. Ovens, and J. M. Swales. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan.
MICUSP	Michigan Corpus of Upper-level Student Papers	Michigan Corpus of Upper-level Student Papers. (2009). Ann Arbor, MI: The Regents of the University of Michigan.
MOECS	Corpus of Multilingual Opinion Essays by College Students	Okugiri, M., Ijuin, I., Komori, K. 2015. The Corpus of Multilingual Opinion Essays by College Students. RETRIEVED from http://www.u-sacred-heart.ac.jp/okugiri/links/moecs/moecs.html
NECTE	Newcastle Electronic Corpus of Tyneside English	The Newcastle Electronic Corpus of Tyneside English.
OBC	Old Bailey Corpus	Huber, Magnus; Nissel, Magnus; Maiwald, Patrick; Widlitzki, Bianca. 2012. The Old Bailey Corpus. Spoken English in the 18th and 19th centuries. www1.uni-giessen.de/oldbaileycorpus, [date of access]. Users who wish to cite material from the Old Bailey Corpus Online Website in publications should provide the URL (www1.uni-giessen.de/oldbaileycorpus) and the date on which the website was consulted. To cite concordance material obtained by searching a corpus in the Old Bailey Corpus suite (online or offline), include the version of the corpus used as well as the trial ID provided in the online concordances or in the <speech> or <trial> tags. For more information, see the Citation guide
PCEEC	Parsed Corpus of Early English Correspondence	Parsed Corpus of Early English Correspondence, parsed version. 2006. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University ofYork and Helsinki: University of Helsinki. Distributed through the OxfordText Archive. Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive. Parsed Corpus of Early English Correspondence, text version. 2006. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin, with additional annotation by Ann Taylor. Helsinki: University of Helsinki and York: University of York. Distributed through the Oxford Text Archive.
PCEEC2	Parsed Corpus of Early English Correspondence, 2nd edition	Parsed Corpus of Early English Correspondence 2, parsed version. 2022. Revised and corrected by Beatrice Santorini. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. https://github.com/beatrice57/pceec2
PPCEME	The Penn-Helsinki Parsed Corpus of Early Modern English	Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-3/index.html
PPCMBE	The Penn Parsed Corpus of Modern British English	Kroch, Anthony, Beatrice Santorini and Ariel Diertani. 2010. Penn Parsed Corpus of Modern British English. http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html
PPCME2	The Penn-Helsinki Parsed Corpus of Middle English, second edition	Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition. http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-4/index.html
PWEC	Pakistan Written English Corpus	Khan, U. (2023) Pakistan Written English Corpus.
QHC	Quaker Historical Corpus	Quaker Historical Corpus (QHC). 2015. Compiled by Judith Roads at the University of Birmingham. http://www.woodbrooke.org.uk/pages/quaker-historical-corpus.html
RCN1	Rostock Newspaper Corpus	Compiled by Friedrich Ungerer, Kristina Schneider, Birte Bös at Rosctock University.
SC	Salamanca Corpus. Digital Archive of English Dialect Texts	Copyright © 2011-DING, The Salamanca Corpus, Universidad de Salamanca
SCEPA	Small Corpus of English Political Apologies	Small Corpus of English Political Apologies (SCEPA). 2017. Compiled by Halyna Liubinska. Lviv Polytechnic National University.
SCoCESLE	Small Corpus of Colombian English as a Second Language Essays	Velasco, E. (2023). Small Corpus of Colombian English as a Second Language Essays (SCoCESLE), Mendeley Data, V1, doi:10.17632/wfcbfy29wm.1
SCONE	Seville Corpus of Northern English	Seville Corpus of Northern English (SCONE). Project leader Dra. Julia Fernández Cuesta, Departamento de Lengua Inglesa, Universidad de Sevilla.
SCOTS	Scottish Corpus of Texts & Speech	Scottish Corpus of Texts & Speech (2007). Department of English Language, University of Glasgow, Scotland, UK.
SCPS	Small Corpus of Political Speeches	Small Corpus of Political Speeches (SCPS). Compiled under the supervision of Jukka Tyrkkö (University of Helsinki).
TaCoCASE	Transatlantic Component of the Corpus of Academic Spoken English	TaCoCASE. 2023. Transatlantic Component of the CASE project. Birkenfeld: Trier University of Applied Sciences. Version 1.0. Collet, Caroline. [http://umwelt-campus.de/case/tacocase]
TCEECE	Tagged Corpus of Early English Correspondence Extension	Tagged Corpus of Early English Correspondence Extension. 2020. Annotated by Lassi Saario & Tanja Säily. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki.
TCEECES	Tagged Corpus of Early English Correspondence Extension Sampler	Tagged Corpus of Early English Correspondence Extension Sampler. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Annotated by Lassi Saario & Tanja Säily. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG. https://doi.org/10.5281/zenodo.5887230
TIME	Time Corpus	Davies, Mark. (2007–) TIME Magazine Corpus (100 million words, 1920s-2000s). Available online at http://corpus.byu.edu/time.
ViMELF	Corpus of Video-Mediated English as a Lingua Franca Conversations	ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. Version 1.0. The CASE project [umwelt-campus.de/case].
VOICE	Vienna-Oxford International Corpus of English	The recommended citation for VOICE 1.0 Online is: VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). Director: Barbara Seidlhofer; Researchers: Angelika Breiteneder, Theresa Klimpfinger, Stefan Majewski, Marie-Luise Pitzl. http://voice.univie.ac.at (date of last access). The short citation for VOICE 1.0 Online is: VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). http://voice.univie.ac.at (date of last access).
WestLabUSENET	Reduced redundancy USENET corpus	Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET corpus (2005-2011) Edmonton, AB: University of Alberta (downloaded from http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html)
WomenScientists	A Corpus of Women Scientists	The Corpus of Women Scientists is a subcorpus of The Coruña Corpus of English Scientific Writing (CC). Forthcoming.
YCCQA	Yahoo-based Contrastive Corpus of Questions and Answers	Contrastive Corpus of Questions and Answers. 2009. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven.
YCOE	York-Toronto-Helsinki Parsed Corpus of Old English Prose	Taylor, A., A. Warner, S. Pintzuk, and F. Beths. (2003). The York-Toronto-Helsinki Parsed Corpus of Old English Prose. Electronic texts and manuals available from the Oxford Text Archive.
ZEN	Zurich English Newspaper corpus	ZEN, Zurich English Newspaper Corpus Version 1.0. English Department of the University of Zurich. http://es-zen.unizh.ch.