Corpus Finder

  • To sort corpora according to any attribute, click on the appropriate column header.
  • Use the filters to view a specific selection of corpora.
  • For explanations of the table categories, see below.

 

Corpus Start End Periods Word Count Text Samples Spoken/
Written
Annotation Format Availability
ALEC - Advanced Learner English Corpus 2004 2013 PDE 1,300,000 146 Written None Not available
APU - APU Writing and Reading Corpus 1979-1988 1979 1988 PDE 172,000 543 Written
Online Free subscription
ARCHER - A Representative Corpus of Historical English Registers 1600 1999 EModE
LModE
PDE

On-site
Online
BASE - British Academic Spoken English Corpus 2000 2005 PDE Spoken Download Free subscription
BAWE - British Academic Written English Corpus 2000 2007 PDE 6,506,995 2761 Written Tagging
Other
Download Free subscription
BE06 - The British English 2006 corpus 2003 2008 PDE 1,010,996 500 Written Tagging
Online License required
BLOB-1931 - The BLOB-1931 Corpus 1928 1934 PDE 1,000,000 500 Written Tagging
None
In preparation
BNC - British National Corpus PDE 100,000,000 Written & Spoken Tagging
Other
Download Free subscription
BROWN - A Standard Corpus of Present-Day Edited American English 1961 1961 PDE 1,000,000 500 Written Tagging
Other
None
CD License required
B-BROWN - The 1930s Brown Corpus 1928 1934 PDE 1,000,000 500 Written Tagging
Parsing
None
Other
On-site In preparation
Buckeye Corpus 1998 2000 PDE 300,000 40 Spoken Other Download Free subscription
CASE - Corpus of Academic Spoken English 2012 PDE 300 Spoken Tagging
Other
Online In preparation
CC - The Coruña Corpus of English Scientific Writing 1700 1900 LModE Written CD
Download
Open access
CC: CETA - A Corpus of English Texts on Astronomy 1700 1900 LModE 409,909 42 Written Other CD
Download
Open access
CC: CEPhiT - A Corpus of English Philosophy Texts 1700 1900 LModE 400,416 40 Written Other CD
Download
Open access
CC: CHET - A Corpus of History English Texts 1700 1900 LModE 404,311 40 Written Other Download Open access
CC: CELiST - A Corpus of English Life Sciences Texts 1700 1900 LModE 400,305 40 Written Other In preparation
CC: Women Scientists - A Corpus of Women Scientists 1700 1930 LModE
PDE
40 Written In preparation
CED - A Corpus of English Dialogues 1560-1760 1560 1760 EModE 1,183,690 177 Written & Spoken Other CD
Download
License required
CEEC - Corpus of Early English Correspondence 1402 1800 ME
EModE
LModE
5,100,000 12,000 Written Other On-site Free subscription
CEEC - Corpus of Early English Correspondence / 1998 version 1410 1681 ME
EModE
2,597,795 5,961 Written Other On-site Free subscription
CEECE - Corpus of Early English Correspondence Extension 1653 1800 EModE
LModE
2,219,422 4,923 Written Other On-site Free subscription
CEECE: TCEECE - Tagged Corpus of Early English Correspondence Extension 1653 1800 EModE
LModE
2,219,422 4,923 Written Tagging
Other
On-site Free subscription
CEECES - Corpus of Early English Correspondence Extension Sampler 1653 1800 EModE
LModE
1,140,286 2,624 Written Tagging
Other
Download Open access
CEECSU - Corpus of Early English Correspondence Supplement 1402 1663 ME
EModE
442,484 829 Written Other On-site Free subscription
CEECS - Corpus of Early English Correspondence Sampler 1418 1680 ME
EModE
450,085 1,123 Written Other CD
Download
Free subscription
CEEC: PCEEC - Parsed Corpus of Early English Correspondence 1410 1681 ME
EModE
2,159,132 4,970 Written Tagging
Parsing
Other
Download Free subscription
CEEC: PCEEC2 - Parsed Corpus of Early English Correspondence, 2nd edition 1410 1681 ME
EModE
2,159,132 4,970 Written Parsing
Other
Download Open access
CEEM - Corpus of Early English Medical Writing 1375 1800 ME
EModE
LModE
4,500,000 1,164 Written Other CD Commercial
CEEM: MEMT - Middle English Medical Texts 1375 1500 ME 495,322 86 Written Other CD Commercial
CEEM: EMEMT - Early Modern English Medical Texts 1500 1700 EModE 2,000,000 450 Written Other CD Commercial
CEEM: LMEMT- Late Modern English Medical Texts 1700 1800 LModE 2,000,000 628 Written Other CD Commercial
CHELAR - Corpus of Historical English Law Reports 1535-1999 1535 1999 EModE
LModE
PDE
463,009 369 Written Tagging
None
Download

Open access

CIE - A Corpus of Irish English 14th–20th c. 14c present ME
EModE
LModE
PDE
70 Written Other CD From compiler
CLEP - Corpus of Late 18th c. Prose 1761 1790 LModE 300,000 1827 Written None Download

Free subscription

CLOB - A Brown Family Corpus of Written British English 2008 2011 PDE 1,000,000 500 Written Download

Open access

CLMETEV - The Corpus of Late Modern English Texts 1710 1920 LModE 15,000,000 176 Written None Download

Free subscription

CMEPV - Corpus of Middle English Prose and Verse ME 62 Written Online

Open access

CMSW - Corpus of Modern Scottish Writing 1700 1945 LModE 5,500,000 62 Written Online

Open access

CNNE - Corpus of Nineteenth-century Newspaper English 1830 1895 LModE 320,000 200 Written None Onsite

Not available

COCA - Corpus of Contemporary American English 1990 2009 PDE 520,000,000 - Written & Spoken Tagging Online Free subscription
CoCELD - Corpus of Contemporary English Legal Decisions, 1950–2021 1950 2021 PDE 733,227 288 Written Tagging Download Free subscription
CoER - Corpus of Early English Recipes 1375 1900 ME
EModE
LModE
1,500,000 150 Written

In preparation

COERP - Corpus of English Religious Prose 1150 1800 ME
EModE
LModE
    Written Other - In preparation
COHA - Corpus of Historical American English 1810 2009 LModE
PDE
4,000,000 100,000 Written Tagging Online Open access
COLMOBAENG - Corpus of Late Modern British and American English Prose 2006 2007 PDE 1,170,000 173 Written None Download Open access
CoNE - Corpus of Narrative Etymologies 1150 1325 EME Written Tagging
Other
Online Open access
CONTE-pC - Corpus of Early Ontario English, pre-Confederation Section 1776 1849 LModE 125,000 Written From compiler
CONTRAST-IT 2011 2015 PDE 300,000 Written Tagging Online Open access
COOEE - Corpus of Oz Early English 1788 1900 LModE 2,000,000 1353 Written Other - Free subscription
CoSiB - Corpus of Singaporean Blogs 2006 2010 PDE 200,000 100 Written From compiler
CROWN - Crown Corpus 2008 2011 PDE 1,026,226 500 Written Tagging
Parsing
Online Open access
CSC - Corpus of Scottish Correspondence 1500 1715 EModE 256,300 719 Written Other Online In preparation
DCPSE - Diachronic Corpus of Present-Day Spoken English 1958 1992 PDE 800,000 280 Spoken Tagging
Parsing
Other
CD Commercial
DECTE - Diachronic Electronic Corpus of Tyneside English 1960 2000 PDE 804,266 99 Spoken Download
DVD
From compiler
DOEC - Dictionary of Old English Corpus 600 1150 OE 4,000,000 3060 Written Other CD License required
ELFA - English as a Lingua Franca in Academic Settings 2001 2008 PDE 1,010,834 165 Written & Spoken   CD License required
ENPC - The English-Norwegian Parallel Corpus 1975 1995 PDE 2,600,000 100 Written Tagging
Other
On-site Free subscription
FLOB - The Freiburg-Lancaster-Oslo/Bergen Corpus 1992 1992 PDE 1,000,000 500 Written Tagging
None
CD License required
FRED - Freiburg Corpus of English Dialects 1970 1999 PDE 1,011,396 121 Spoken Other

On-site
CD

Free subscription
FROWN - The Freiburg-Brown Corpus 1991 1991 PDE 1,000,000 500 Written Tagging
None
CD License required
Google Books - Google Books Corpora 1500 2009 EModE
LModE
PDE
2000,000,000,000 Written Tagging
Other
Online Open access
HARES - Helsinki Corpus of Regional English Speech 1970 1980 PDE     Spoken Other Download License required
HC - Helsinki Corpus 730 1710 OE
ME
EModE
1,572,800 450

Written

Other CD
Download
License required
HCOS - Helsinki Corpus of Older Scots 1450 1700 EModE 834,200 71 Written Other CD License required
HD - Helsinki Corpus of British English Dialects 1970 1985 PDE 1,008,641 187 Spoken Other On-site

Free subscription

HUM19UK - HUM19UK Corpus 1800 1899 LModE 13,000,000 100 Written Other Download

Open access

ICE - International Corpus of English Written & Spoken Tagging
Other
Download
CD

Free subscription

ICE-GB: International Corpus of English - The British component 1990 1993 PDE 1,061,264 500 Written & Spoken

Tagging Parsing

 

CD License required
ICE-GBR: International Corpus of English - Gibraltar 2000 1993 PDE 1,000,000 Written & Spoken In preparation
ICE-NIG: International Corpus of English - Nigeria 2000 PDE 1,000,000 902 Written & Spoken Tagging Download Open access
ICE-SCO: International Corpus of English - Scotland 2013 2016 PDE 1,000,000 Written & Spoken Tagging Download In preparation
ICoMEP - Innsbruck Corpus of Middle English Prose ME 7,800,000 129 Written CD From compiler
JSCC - The John Swales Conference Corpus 2006 2006 PDE 100,000 23 Spoken None Download Open access
LAEME - A Linguistic Atlas of Early Middle English 1150 1325 ME 816,170 167 Written Other Online Open access
eLALME - A Linguistic Atlas of Late Mediaeval English 1150 1325 ME Written Other Online Open access
LAMSAS - A Linguistic Atlas of the Middle and South Atlantic States 1933 1974 PDE Spoken Other Online Open access
LC - The Lampeter Corpus of Early Modern English Tracts 1640 1740 EModE 1,193,385 120 Written Other CD
Download
License required
LLC - The London-Lund Corpus of Spoken English 1953 1987 PDE 500,000 100 Spoken Other CD License required
LOB - The Lancaster-Oslo/Bergen Corpus 1961 1961 PDE 1,000,000 500 Written Tagging
None
CD License required
MCEESP - The Málaga Corpus of Early English Scientific Prose 1350 1900 ME
EModE
LModE
6,000,000 Written Tagging

Online
Download

In preparation
MCLMESP - The Málaga Corpus of Late Middle English Scientific Prose 1350 1500 ME 1,500,000 Written Tagging
Other
Download Open access
MCEModESP - The Málaga Corpus of Early Modern English Scientific Prose 1500 1700 EModE 1,500,000 Written Tagging Online
Download
Free subscription
MCLModESP - The Málaga Corpus of Late Modern English Scientific Prose 1700 1900 LModE 3,000,000 Written Tagging Online
Download
In preparation
MEG-C - The Middle English Grammar Corpus 1350 1500 ME 450,000 320 Written Other Download Open access
MICASE - Michigan Corpus of Academic Spoken English 1997 2001 PDE 1,800,000 152 Spoken Other CD
Download
Online
Open access
MICUSP - Michigan Corpus of Upper-level Student Papers 2002 2009 PDE 2,600,000 829 Written Other Online Open access
MOECS - Corpus of Multilingual Opinion Essays by College Students 2007 2016 PDE 477 Written Download Free subscription
NECTE - Newcastle Electronic Corpus of Tyneside English 1969 1994 PDE   62 Spoken Other Download
DVD
Free subscription
OBC - Old Bailey Corpus 1720 1913

EModE

LModE

14,000,000   Spoken Tagging Online Free subscription
PPCEME - The Penn-Helsinki Parsed Corpus of Early Modern English 1500 1710 EModE 1,794,010 229 Written Tagging
Parsing
None
CD License required
PPCMBE - The Penn-Helsinki Parsed Corpus of Modern British English 1700 1914 LModE
PDE
948,895 101 Written Tagging
Parsing
None
CD License required
PPCME2 - The Penn-Helsinki Parsed Corpus of Middle English, 2nd edition 1150 1500 ME 1,155,965 55 Written Tagging
Parsing
None
CD License required
PWEC - Pakistan Written English Corpus 2020 2023 PDE 7,586,110 4,158 Written None From compiler
QHC - Quaker Historical Corpus 1650 1699 EModE 722,370 173 Written None Online Open access
RCN1 - Rostock Newspaper Corpus 1700 2000 LModE
PDE
600,000 Written On-site
SC - Salamanca Corpus. Digital Archive of English Dialect Texts 1500 1950

EModE

LModE

6,115,267   Written Tagging Online
SCEPA - Small Corpus of English Political Apologies 1950 2017 PDE 22,538 232 Written & Spoken Other Download Open access
SCoCESLE - Small Corpus of Colombian English as a Second Language Essays 2022 2023 PDE 81,994 272 Written None Download Open access
SCONE - Seville Corpus of Northern English 600 1590 OE
ME
    Written Other Download Open access
SCOTS - Scottish Corpus of Texts & Speech 1945 2007 PDE 4,000,000 1177 Written & Spoken Other
None
Online Open access
SCPS - Small Corpus of Political Speeches 1789 2010 PDE 655,479 239 Written & Spoken Tagging On-site License required
TaCoCASE - Transatlantic Component of the Corpus of Academic Spoken English 2016 2023 PDE 140,003 15 Spoken Other Online
Download
Free subscription
TIME - TIME corpus 1923 2009 PDE 100,000,000 275,000 Written Tagging Online Free subscription
ViMELF - Corpus of Video-Mediated English as a Lingua Franca Conversations 2012 2015 PDE 152,472 20 Spoken Tagging
Other
None
Download Free subscription
VOICE - Vienna-Oxford International Corpus of English 2000 2007 PDE 1,023,043 151 Spoken Other Online Free subscription
WestLabUSENET - Reduced redundancy USENET corpus 2005 2011 PDE 6,089,697,986 22,799,995 Written None Download Open access
YCCQA - Yahoo-based Contrastive Corpus of Questions and Answers 2006 2009 PDE 29,400,000 665,000 Written Other Download Free subscription
YCOE - The York-Toronto-Helsinki Parsed Corpus of Old English Prose - - OE 1,500,000 100 Written Tagging Download Free subscription
ZEN - Zurich English Newspaper corpus 1661 1791 EModE
LModE
1,600,000 349 Written Other Online
CD
Free subscription

 

Corpus Finder categories

Corpus

Some corpora consist of subcorpora (CEEC, CEEM). In these cases both the entire corpus and the subcorpora have been listed; the subcorpora are indented.

Start, End, Periods

The period labelling follows roughly the categorisation below unless a particular period is specified in the name of the corpus.

OE

Old English c. -1300

ME

Middle English c. 1300-1500

EModE

Early Modern English c. 1500-1700

LModE

Late Modern English c. 1700-1900

PDE

Present Day English 1900-

Word count, Text samples

Left empty when the word count or number of text samples is unknown.

Spoken/Written

Shows whether the corpus material is from written sources, recorded speech or both.

Annotation

Tagging

Part-of-speech annotation

Parsing

Syntactic annotation

Other

Annotation of, e.g., discursive features, text structure,
phonetic features, orthography, etc.

None

 

Format

CD/DVD

The corpus is distributed on a disc.

Download

The corpus can be downloaded from the internet.

Online

The corpus is accessible online without downloading.

On-site

The corpus can only be accessed locally.

Availability

Open access

The corpus can be freely used by anyone.

Free subscription

The corpus is free to use but requires a subscription.

Licence required

A paid subscription is required.

Commercial

 

In preparation

 

Not available

The corpus is not available to external users for copyright reasons.

 

Javascript for the Corpus Finder table by Max Guglielmi (http://tablefilter.free.fr/).