Reference lines

The lack of a universally established reference line is a problem that plagues most corpora. A corpus is (usually) not a publication, and it can be very difficult to know what the official name of the corpus is, whose names ought to be included, what year (if any) should be mentioned, and what organizations or institutions are or were involved in the project.

Through CoRD, we hope to alleviate the problem by giving corpus compilers a place in which to let colleagues know what they, the compilers, think the reference lines ought to be. The reference lines are to be found in the description of each individual corpus. For quick reference and comparison, we also collect them here. The colourful reality of reference lines is clearly apparent.

Corpus Reference line


Advanced Learner English Corpus

Advanced Learner English Corpus (ALEC). Compiled by Tove Larsson. Uppsala University.


APU Writing and Reading Corpus 1979-1988

APU Writing and Reading Corpus 1979-1988 (APU). Compiled by Nuria Yáñez-Bouza (University of Vigo, Spain) and Victorina González-Díaz (University of Liverpool, UK).


A Representative Corpus of Historical English Registers

Publications making use of ARCHER shall include a reference to the name of the corpus, the years of compilation, and the compiler team. A suitable bibliographic listing is as follows (with ‘x’ replaced as appropriate):

ARCHER-X = A Representative Corpus of Historical English Registers version X. 1990–1993/2002/2007/2010/2013/2016. Originally compiled under the supervision of Douglas Biber and Edward Finegan at Northern Arizona University and University of Southern California; modified and expanded by subsequent members of a consortium of universities. Current member universities are Bamberg, Freiburg, Heidelberg, Helsinki, Lancaster, Leicester, Manchester, Michigan, Northern Arizona, Santiago de Compostela, Southern California, Trier, Uppsala, Zurich. Examples of usage taken from ARCHER were obtained under the terms of the ARCHER User Agreement.

We recommend that individual citations from ARCHER should include the text identifier (filename), e.g. “1722grah_s3b”. The ARCHER version used should be acknowledged with the citation or globally in the bibliography. For examples retrieved at the consortium departments this will be ARCHER 3.2 / 3.1 / 2 / 1, as appropriate; for examples retrieved from the online versions it will be ARCHER 3.2 (Lancaster) or ARCHER 3.2 (Zurich), as appropriate.


British Academic Spoken English Corpus

British Academic Spoken English Corpus (BASE). The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson.

When referring to the BASE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Thompson, P. and Nesi, H. (2001) The British Academic Spoken English (BASE) Corpus Project. Language Teaching Research 5 (3) 263-264


British Academic Written English Corpus

British Academic Written English Corpus (BAWE).

Use of the corpus is acknowledged using the following form of words: The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800).

When referring to the BAWE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Gardner, S. & Nesi, H. (2013) A classification of genre families in university student writing Applied Linguistics 34 (1) 1-29 or Nesi, H. & Gardner, S (2012) Genres across the Disciplines: Student writing in higher education. Cambridge University Press.


The B-Brown-1931 Corpus

The B-Brown-1931 Corpus (B-BROWN). Project leader Marianne Hundt.


British English 2006

The British English 2006 corpus (BE06). Compiled by Paul Baker.


The BLOB-1931 Corpus

The BLOB-1931 Corpus (BLOB-1931). Project leaders: 1: Geoffrey Leech (University of Lancaster), 2: Paul Rayson (Lancaster University).


British National Corpus

British National Corpus (BNC). A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.


The Brown corpus

A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.


Buckeye Corpus

Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [] Columbus, OH: Department of Psychology, Ohio State University (Distributor).


Corpus of Academic Spoken English

Corpus of Academic Spoken English (CASE).

Long citation:
CASE. Forthcoming. Corpus of Academic Spoken English. Stefan Diemer; Marie-Louise Brunner; Caroline Collet; and Selina Schmidt. Saarbrücken: Saarland University (coordination) / Sofia: St Kliment Ohridski University / Forlì: University of Bologna-Forlì / Santiago: University of Santiago de Compostela / Helsinki: Helsinki University & Hanken School of Economics / Birmingham: Birmingham City University / Växjö: Linnaeus University / Lyon: Université Lumière Lyon 2 / Louvain-la-Neuve: Université catholique de Louvain. [] (date of last access).

Short citation:
CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. [] (date of last access).

Single transcript citation:
05HE18FL52. CASE. Forthcoming. Corpus of Academic Spoken English. Saarbrücken: Saarland University. (please also cite CASE)


Coruña Corpus of English Scientific Writing

Coruña Corpus of English Scientific Writing (CC). Compiled by MUSTE Research Group. Project leader Isabel Moskowich.

Parapar López, Javier & Moskowich, Isabel. 2007. The Coruña Corpus Tool. Revista del Procesamiento de Lenguaje Natural, 39: 289–290.

Moskowich, Isabel & Crespo García, Begoña. 2007. Presenting the Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing. In Pérez Guerra, Javier et al. (eds.) ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English. Bern: Peter Lang (341–357).

Moskowich-Spiegel Fandiño, Isabel & Parapar López, Javier. 2008. Writing Science, Compiling Science. The Coruña Corpus of English Scientific Writing. In Lorenzo Modia, María Jesús (ed.) Proceedings from the 31st AEDEAN Conference. A Coruña: Universidade da Coruña (531–544).

Crespo García, Begoña & Isabel Moskowich. 2010. CETA in the Context of the Coruña Corpus. Literary and Linguistic Computing, 25(2): 153–164. doi:10.1093/llc/fqp038


A Corpus of English Dialogues 1560-1760

A Corpus of English Dialogues 1560-1760. 2006. Compiled under the supervision of Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University).


Corpus of Early English Correspondence

Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin at the Department of Modern Languages, University of Helsinki.


Corpus of Early English Correspondence Extension

Corpus of Early English Correspondence Extension. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio at the Department of Modern Languages, University of Helsinki.


Corpus of Early English Correspondence Extension Sampler, part 1

Corpus of Early English Correspondence Extension Sampler, part 1. 2021. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG.


Corpus of Early English Correspondence Extension Sampler, part 2

Corpus of Early English Correspondence Extension Sampler, part 2. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG.


Corpus of Early English Correspondence Sampler

Corpus of Early English Correspondence Sampler. 1998. Compiled by Jukka Keränen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin and Helena Raumolin-Brunberg at the Department of English, University of Helsinki.


Corpus of Early English Correspondence Supplement

Corpus of Early English Correspondence Supplement. Compiled by Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Terttu Nevalainen, Arja Nurmi, Minna Palander-Collin, Helena Raumolin-Brunberg and Anni Sairio at the Department of English, University of Helsinki.


Corpus of Early English Medical Writing

See individual subcorpora


A Corpus of English Life Sciences Texts

A Corpus of English Life Sciences Texts (CELiST) is a sub-corpus of the Coruña Corpus of English Scientific Writing (CC). Forthcoming. Compiled by the MUSTE Research Group.


A Corpus of English Philosophy Texts

Moskowich, Isabel. 2016. Philosophers and Scientists from the Modern Age: Compiling the Corpus of English Philosophy Texts (CEPhiT). In Moskowich, Isabel et al. (eds.) ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins (1–23).

Moskowich, Isabel; Camiña, Gonzalo; Lareo, Inés; Crespo, Begoña (eds.) 2016. ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy. Amsterdam: John Benjamins.


A Corpus of English Texts on Astronomy

Moskowich, Isabel. 2012. CETA as a Tool for the Study of Modern Astronomy in English. In Moskowich, Isabel & Crespo García, Begoña (eds.) Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins (35–56).

Moskowich, Isabel; Crespo, Begoña (eds.) 2012. Astronomy ‘playne and simple’: The Writing of Science between 1700 and 1900. Amsterdam: John Benjamins.


Corpus of Historical English Law Reports 1535-1999

Corpus of Historical English Law Reports 1535-1999 (CHELAR). Compiled by Paula Rodríguez-Puente, Teresa Fanego (Project Director), María José López-Couso, Belén Méndez-Naya, Paloma Núñez-Pertejo. University of Santiago de Compostela: Research Unit for Variation, Linguistic Change and Grammaticalization, Department of English and German.


A Corpus of History English Texts

Crespo, Begoña and Moskowich, Isabel. 2015. A Corpus of History Texts (CHET) as part of the Coruña Corpus Project. In Proceedings of the international scientific conference Corpus linguistics – 2015. St Petersburgo: St Petersburgh State University. 14–23.

Moskowich, Isabel; Puente-Castelo, Luis; Crespo, Begoña and Monaco, Leida Maria. Forthcoming. “From his own diary we learn”: Investigating the Corpus of History English Texts. Amsterdam/Philadelphia: John Benjamins.


Corpus of Irish English 14th-20th c.

Corpus of Irish English 14th-20th c. (CIE). Compiled by Raymond Hickey and contained in: Hickey, Raymond 2003. Corpus Presenter. Software for language analysis.. Amsterdam: John Benjamins, 292 pages with CD-ROM.


Corpus of Late 18th c. Prose

Corpus of Late 18th c. Prose (CLEP). The Corpus of late 18c Prose is available without fee for educational and research purposes, but it is not in the public domain. Copyright to the text is retained by the John Rylands University Library of Manchester; copyright to the annotated files is retained by David Denison and Linda van Bergen (© 2002).


Corpus of Late Modern English Texts

The Corpus of Late Modern English Texts (Extended Version). 2006. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven.


A Brown family corpus of written British English.

Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183.


Corpus of Middle English Prose and Verse

Corpus of Middle English Prose and Verse (CMEPV). Copyright institution : The Humanities Text Initiative, University of Michigan.


Corpus of Modern Scottish Writing

Corpus of Modern Scottish Writing (CMSW). Project leaders John Corbett and Jeremy Smith. © Corpus of Modern Scottish Writing, Glasgow University.


Corpus of Nineteenth-century Newspaper English

Corpus of Nineteenth-century Newspaper English (CNNE). Compiled by Erik Smitterberg (Uppsala University.


Corpus of Contemporary American English

Davies, Mark. (2008–) The Corpus of Contemporary American English (COCA): 520 million words, 1990-present. Available online at


Corpus of Contemporary English Legal Decisions, 1950–2021

Rodríguez-Puente, Paula and David Hernández-Coalla. 2022. Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD). Oviedo: University of Oviedo.


Corpus of Early English Recipes

Corpus of Early English Recipes (CoER). Forthcoming. Compilers Francisco Alonso-Almeida, Ivalla Ortega-Barrera, Elena Quintana-Toledo.


Corpus of English Religious Prose

Corpus of English Religious Prose (COERP). Compilers Thomas Kohnen, Tanja Rütten, Ingvilt Marcoe, Kirsten Gather, Dorothee Groeger, Anne Döring, Stefanie Leu.


Corpus of English Religious Prose

Davies, Mark. (2010-) The Corpus of Historical American English: 400 million words, 1810-2009.


Corpus of Late Modern British and American English Prose

Corpus of Late Modern British and American English Prose (COLMOBAENG). Compiler Teresa Fanego.


Corpus of Narrative Etymologies

An appropriate citation is:

A Corpus of Narrative Etymologies compiled by Roger Lass, Margaret Laing, Rhona Alcorn and Keith Williamson []. Edinburgh: Version 1.1, 2013-, ©The University of Edinburgh.


Lass, Roger, Margaret Laing, Rhona Alcorn, Keith Williamson. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 1.1 []. Edinburgh: © The University of Edinburgh.


Corpus of Early Ontario English, pre-Confederation Section

Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Dollinger, Stefan (ed.) 2006. The Corpus of Early Ontario English, pre-Confederation Section (CONTE-pC). Version 0.9. University of Vienna.



Anna-Maria De Cesare (2011–2018). CONTRAST-IT. University of Basel,


Corpus of Oz Early English

Corpus of Oz Early English (COEE). The corpus has been compiled by the author as part of a doctoral thesis on the origins of Australian English:

Fritz, Clemens. 2007. From Early English in Australia to Australian English 1788-1900. Frankfurt: Peter Lang.


Corpus of Singaporean Blogs

Corpus of Singaporean Blogs (CoSiB) 2006-2010. Copyright by Universität Trier, English Department, Andrea Sand.


CROWN Corpus

CROWN Corpus. A Brown family corpus of written American English.

Xu, Jiajin & Maocheng Liang. 2013. A tale of two C’s: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175–183.


Corpus of Scottish Correspondence

Meurman-Solin, Anneli. 2007. CSC The Corpus of Scottish Correspondence, 1500-1715.


Diachronic Corpus of Present-Day Spoken English

Diachronic Corpus of Present-Day Spoken English (DCPSE). Compiled by Professor Bas Aarts (Principal Investigator), Sean Wallis (Senior Research Fellow), Dr Dirk Bury, Lesley Kirk, Yordanka Kostadinova-Kavalova, Dr Ann Law and Gabriel Ozón.


Diachronic Electronic Corpus of Tyneside English

Corrigan, Karen P., Buchstaller, I., Mearns, A.J. and Moisl, H.L. (2012) The Diachronic Electronic Corpus of Tyneside English. Newcastle University.


Dictionary of Old English Corpus

Dictionary of Old English Corpus (DOEC); original release (1981) compiled by Angus Cameron, Ashley Crandell Amos, Sharon Butler, and Antonette diPaolo Healey (Toronto: DOE Project 1981); 2009 release compiled by Antonette diPaolo Healey, Joan Holland, Ian McDougall, and David McDougall, with TEI-P5 conformant-version by Xin Xiang (Toronto: DOE Project 2009).


Eighteenth-Century English Phonology Database

ECEP = Eighteenth-Century English Phonology database, 2015/2023. Compiled by Joan C. Beal, Nuria Yáñez-Bouza, Ranjan Sen, and Christine Wallis. The University of Sheffield and Universidade de Vigo. Published by: University of Sheffield.


English as a Lingua Franca in Academic Settings

ELFA 2008. The Corpus of English as a Lingua Franca in Academic Settings. Director: Anna Mauranen.


Early Modern English Medical Texts

Early Modern English Medical Texts (EMEMT). Taavitsainen Irma, Päivi Pahta, Martti Mäkinen, Turo Hiltunen, Ville Marttila, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.). CD-ROM. Amsterdam: John Benjamins.


The English-Norwegian Parallel Corpus

The English-Norwegian Parallel Corpus (1994-1997), Dept. of British and American Studies, University of Oslo. Compiled by Stig Johansson (project leader), Knut Hofland (project leader), Jarle Ebeling (research assistant), Signe Oksefjell (research assistant).


The Freiburg-Lancaster-Oslo/Bergen Corpus

The Freiburg-LOB Corpus (‘F-LOB’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg

The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster.


Freiburg Corpus of English Dialects

for full version: Freiburg Corpus of English Dialects, English Dialects Research Group, Albert-Ludwigs-Universität Freiburg

for FRED-S: Freiburg Corpus of English Dialects (Sampler), English Dialects Research Group, Albert-Ludwigs-Universität Freiburg


The Freiburg-Brown Corpus

The Freiburg-Brown Corpus (‘Frown’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg

The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster


Google Books Corpora

Davies, Mark. (2011-) Google Books (American English) Corpus (155 billion words, 1810-2009). Available online at

Citation for Google Books and Culturomics:

Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*.  Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331 (2011) [Published online ahead of print 12/16/2010].


Helsinki Archive of Regional English Speech - Cambridgeshire sampler

HARES-CAM = Helsinki Archive of Regional English Speech - Cambridgeshire sampler. 2010. Compiled by Ahava, Simo, Joseph McVeigh and Anna-Liisa Vasko at the Department of Modern Languages, University of Helsinki.

To refer to the corpus data, indicate which interview you are citing in parentheses after the excerpt (interviewID-hares). For example:

(1) then used to take the horses home and <pause/> clean them and feed them (cam13-hares).


Helsinki Corpus

The Helsinki Corpus of English Texts (1991). Department of Modern Languages, University of Helsinki. Compiled by Matti Rissanen (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English).

TEI XML edition:

Helsinki Corpus TEI XML Edition. 2011. First edition. Designed by Alpo Honkapohja, Samuli Kaislaniemi, Henri Kauhanen, Matti Kilpiö, Ville Marttila, Terttu Nevalainen, Arja Nurmi, Matti Rissanen and Jukka Tyrkkö. Implemented by Henri Kauhanen and Ville Marttila. Based on The Helsinki Corpus of English Texts (1991). Helsinki: The Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki.


Helsinki Corpus of Older Scots

The Helsinki Corpus of Older Scots (1995). Department of English, University of Helsinki. Compiled by Anneli Meurman-Solin.


Helsinki Corpus of British English Dialects

The Helsinki Corpus of British English Dialects (2006). Department of Modern Languages, University of Helsinki. All the material consists of interviews made by the fieldworkers mentioned below who have full copyright for the material. For permission to use the files, contact Anna-Liisa Vasko ( or Kirsti Peitsara (


International Corpus of English

International Corpus of English (ICE). The ICE-project is internationally coordinated by by Dr Gerald Nelson at the Chinese University of Hong Kong.

Reference lines and copyrights according to each corpus. Information found in manual.


International Corpus of English - Great Britain

International Corpus of English - the British Component (ICE-GB). Coordinated by the Survey of English Usage.


International Corpus of English - Gibraltar

Seoane, Elena, Lucía Loureiro-Porto & Cristina Suárez-Gómez. (to appear) "The ICE project looks at Iberia: The International Corpus of Gibraltar English".


International Corpus of English - Nigeria

Wunder, Eva-Maria, Holger Voormann, and Ulrike Gut. "The ICE Nigeria corpus project: Creating an open, rich and accurate corpus." ICAME Journal 34 (2010): 78-88.


International Corpus of English - Scotland

Schützler, Ole, Ulrike Gut and Robert Fuchs (to appear). New perspectives on Scottish Standard English: Introducing the Scottish component of the International Corpus of English. In Beal, Joan and Sylvie Hancil (eds.). Northern British English (working title). Berlin: de Gruyter.


Innsbruck Corpus of Middle English Prose

Innsbruck Corpus of Middle English Prose (ICoMEP). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Manfred Markus, Universität Innsbruck.


The John Swales Conference Corpus

John Swales Conference Corpus (2009). Ann Arbor, MI: The Regents of the University of Michigan.


A Linguistic Atlas of Early Middle English

An appropriate citation is:

A Linguistic Atlas of Early Middle English, 1150-1325, compiled by Margaret Laing []. Edinburgh: Version 3.2, 2013, © The University of Edinburgh.


Laing, M. 2013- A Linguistic Atlas of Early Middle English, 1150-1325, Version 3.2 []. Edinburgh: © The University of Edinburgh.

Citation for the 2008 edition - A Linguistic Atlas of Early Middle English, 1150-1325 [] compiled by Margaret Laing and Roger Lass (Edinburgh: © 2007- The University of Edinburgh).


A Linguistic Atlas of Late Mediaeval English

M. Benskin, M. Laing, V. Karaiskos and K. Williamson. An Electronic Version of A Linguistic Atlas of Late Mediaeval English [] (Edinburgh: © 2013- The Authors and The University of Edinburgh).


A Linguistic Atlas of the Middle and South Atlantic States

A Linguistic Atlas of the Middle and South Atlantic States. Project leaders originally Hans Kurath, later Raven I. MacDavid Jr., William A. Kretzscmar Jr.

Users should indicate in any subsequent publication using LAP data/materials that they have obtained the data/materials from the LAP Web site, and indicate that reproduction, copying, distribution, display, etc., as the case may be, of any LAP data/ materials is governed by the "cost of reproduction" condition.


The Lampeter Corpus of Early Modern English Tracts

The Lampeter Corpus of Early Modern English Tracts. 1999. Compiled by Josef Schmied, Claudia Claridge, and Rainer Siemund. (In: ICAME Collection of English Language Corpora (CD-ROM), Second Edition, eds. Knut Hofland, Anne Lindebjerg, Jørn Thunestvedt, The HIT Centre, University of Bergen, Norway.)


The Letter Corpus of ICAMET

The Letter Corpus of ICAMET (LCoICAMET). A part of the Innsbruck Computer Archive of Machine-Readable English Texts (ICAMET). Project leader Markus Manfred, Universität Innsbruck.


Lexicons of Early Middle English

Lexicons of Early Modern English. Ed. Ian Lancashire. Toronto, ON: University of Toronto Library and University of Toronto Press, 2014. Date consulted: [date month year]. URL:


London-Lund Corpus of Spoken English

London-Lund Corpus of Spoken English. Project leader Jan Svartvik, Lund University.


Late Modern English Medical Texts

Late Modern English Medical Texts (LMEMT). Taavitsainen Irma, Päivi Pahta, Turo Hiltunen, Ville Marttila, Raisa Oinonen, Maura Ratia, Carla Suhr and Jukka Tyrkkö (eds.).


Corpus of Late Modern English Prose

Corpus of Late Modern English Prose. Compiled by David Denison (project leader), Linda van Berger, Graeme Trousdale.


The Lancaster-Oslo/Bergen Corpus

The LOB Corpus, original version (1970-1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), and Knut Hofland, University of Bergen (head of computing).

The LOB Corpus, POS-tagged version (1981-1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).


The Málaga Corpus of Early Modern English Scientific Prose

Calle-Martín, Javier et al. 2016. The Málaga Corpus of Early Modern English Scientific Prose (MCEModESP). Málaga: University of Málaga. Available from:


The Málaga Corpus of Late Middle English Scientific Prose

Miranda-García, Antonio et al. 2015. The Málaga Corpus of Late Middle English Scientific Prose (MCLMESP). Málaga: University of Málaga. Available from:


Middle English Grammar project

MEG-C Base, version 2009.1", The Middle English Grammar Corpus, Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith (compilers), December 2009, University of Stavanger, accessed [date],


Middle English Medical Texts

2005. Middle English Medical Texts. Taavitsainen Irma, Päivi Pahta and Martti Mäkinen (eds.). CD-ROM. Amsterdam: John Benjamins.


Michigan Corpus of Academic Spoken English

Simpson, R. C., S. L. Briggs, J. Ovens, and J. M. Swales. (2002) The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan.


Michigan Corpus of Upper-level Student Papers

Michigan Corpus of Upper-level Student Papers. (2009). Ann Arbor, MI: The Regents of the University of Michigan.


Corpus of Multilingual Opinion Essays by College Students

Okugiri, M., Ijuin, I., Komori, K. 2015. The Corpus of Multilingual Opinion Essays by College Students. RETRIEVED from


Newcastle Electronic Corpus of Tyneside English

The Newcastle Electronic Corpus of Tyneside English.


Old Bailey Corpus

Huber, Magnus; Nissel, Magnus; Maiwald, Patrick; Widlitzki, Bianca. 2012. The Old Bailey Corpus. Spoken English in the 18th and 19th centuries., [date of access].

Users who wish to cite material from the Old Bailey Corpus Online Website in publications should provide the URL ( and the date on which the website was consulted. To cite concordance material obtained by searching a corpus in the Old Bailey Corpus suite (online or offline), include the version of the corpus used as well as the trial ID provided in the online concordances or in the <speech> or <trial> tags. For more information, see the Citation guide


Parsed Corpus of Early English Correspondence

Parsed Corpus of Early English Correspondence, parsed version. 2006. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University ofYork and Helsinki: University of Helsinki. Distributed through the OxfordText Archive.

Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.

Parsed Corpus of Early English Correspondence, text version. 2006. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin, with additional annotation by Ann Taylor. Helsinki: University of Helsinki and York: University of York. Distributed through the Oxford Text Archive.


Parsed Corpus of Early English Correspondence, 2nd edition

Parsed Corpus of Early English Correspondence 2, parsed version. 2022. Revised and corrected by Beatrice Santorini. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki.


The Penn-Helsinki Parsed Corpus of Early Modern English

Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004. Penn-Helsinki Parsed Corpus of Early Modern English.


The Penn Parsed Corpus of Modern British English

Kroch, Anthony, Beatrice Santorini and Ariel Diertani. 2010. Penn Parsed Corpus of Modern British English.


The Penn-Helsinki Parsed Corpus of Middle English, second edition

Kroch, Anthony, and Ann Taylor. 2000. Penn-Helsinki Parsed Corpus of Middle English, second edition.


Pakistan Written English Corpus

Khan, U. (2023) Pakistan Written English Corpus.


Quaker Historical Corpus

Quaker Historical Corpus (QHC). 2015. Compiled by Judith Roads at the University of Birmingham.


Rostock Newspaper Corpus

Compiled by Friedrich Ungerer, Kristina Schneider, Birte Bös at Rosctock University.


Salamanca Corpus. Digital Archive of English Dialect Texts

Copyright © 2011-DING, The Salamanca Corpus, Universidad de Salamanca


Small Corpus of English Political Apologies

Small Corpus of English Political Apologies (SCEPA). 2017. Compiled by Halyna Liubinska. Lviv Polytechnic National University.


Small Corpus of Colombian English as a Second Language Essays

Velasco, E. (2023). Small Corpus of Colombian English as a Second Language Essays (SCoCESLE), Mendeley Data, V1, doi:10.17632/wfcbfy29wm.1


Seville Corpus of Northern English

Seville Corpus of Northern English (SCONE). Project leader Dra. Julia Fernández Cuesta, Departamento de Lengua Inglesa, Universidad de Sevilla.


Scottish Corpus of Texts & Speech

Scottish Corpus of Texts & Speech (2007). Department of English Language, University of Glasgow, Scotland, UK.


Small Corpus of Political Speeches

Small Corpus of Political Speeches (SCPS). Compiled under the supervision of Jukka Tyrkkö (University of Helsinki).


Transatlantic Component of the Corpus of Academic Spoken English

TaCoCASE. 2023. Transatlantic Component of the CASE project. Birkenfeld: Trier University of Applied Sciences. Version 1.0. Collet, Caroline. []


Tagged Corpus of Early English Correspondence Extension

Tagged Corpus of Early English Correspondence Extension. 2020. Annotated by Lassi Saario & Tanja Säily. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki.


Tagged Corpus of Early English Correspondence Extension Sampler

Tagged Corpus of Early English Correspondence Extension Sampler. 2022. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily & Anni Sairio at the Department of Languages, University of Helsinki. Spelling standardized by Mikko Hakala, Minna Palander-Collin, Minna Nevala, Emanuela Costea, Anne Kingma & Anna-Lina Wallraff. Annotated by Lassi Saario & Tanja Säily. XML conversion and encoding by Lassi Saario. Helsinki: VARIENG.


Time Corpus

Davies, Mark. (2007–) TIME Magazine Corpus (100 million words, 1920s-2000s). Available online at


Corpus of Video-Mediated English as a Lingua Franca Conversations

ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. Version 1.0. The CASE project [].


Vienna-Oxford International Corpus of English

The recommended citation for VOICE 1.0 Online is:

VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). Director: Barbara Seidlhofer; Researchers: Angelika Breiteneder, Theresa Klimpfinger, Stefan Majewski, Marie-Luise Pitzl. (date of last access).

The short citation for VOICE 1.0 Online is:

VOICE. 2009. The Vienna-Oxford International Corpus of English (version 1.0 online). (date of last access).


Reduced redundancy USENET corpus

Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET corpus (2005-2011) Edmonton, AB: University of Alberta (downloaded from


A Corpus of Women Scientists

The Corpus of Women Scientists is a subcorpus of The Coruña Corpus of English Scientific Writing (CC). Forthcoming.


Yahoo-based Contrastive Corpus of Questions and Answers

Contrastive Corpus of Questions and Answers. 2009. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven.


York-Toronto-Helsinki Parsed Corpus of Old English Prose

Taylor, A., A. Warner, S. Pintzuk, and F. Beths. (2003). The York-Toronto-Helsinki Parsed Corpus of Old English Prose. Electronic texts and manuals available from the Oxford Text Archive.


Zurich English Newspaper corpus

ZEN, Zurich English Newspaper Corpus Version 1.0. English Department of the University of Zurich.