Corpus Finder
- To sort corpora according to any attribute, click on the appropriate column header.
- Use the filters to view a specific selection of corpora.
- For explanations of the table categories, see below.
Corpus |
Start |
End |
Periods |
Word Count |
Text Samples |
Spoken/
Written |
Annotation |
Format |
Availability |
ALEC - Advanced Learner English Corpus |
2004 |
2013 |
PDE |
1,300,000 |
146 |
Written |
None |
|
Not available |
APU - APU Writing and Reading Corpus 1979-1988 |
1979 |
1988 |
PDE |
172,000 |
543 |
Written |
|
Online |
Free subscription |
ARCHER - A Representative Corpus of Historical English Registers |
1600 |
1999 |
EModE
LModE
PDE |
|
|
|
|
On-site Online |
|
BASE - British Academic Spoken English Corpus |
2000 |
2005 |
PDE |
|
|
Spoken |
|
Download |
Free subscription |
BAWE - British Academic Written English Corpus |
2000 |
2007 |
PDE |
6,506,995 |
2761 |
Written |
Tagging
Other
|
Download |
Free subscription |
BE06 - The British English 2006 corpus |
2003 |
2008 |
PDE |
1,010,996 |
500 |
Written |
Tagging
|
Online |
License required |
BLOB-1931 - The BLOB-1931 Corpus |
1928 |
1934 |
PDE |
1,000,000 |
500 |
Written |
Tagging None |
|
In preparation |
BNC - British National Corpus |
|
|
PDE |
100,000,000 |
|
Written & Spoken |
Tagging Other |
Download |
Free subscription |
BROWN - A Standard Corpus of Present-Day Edited American English |
1961 |
1961 |
PDE |
1,000,000 |
500 |
Written |
Tagging Other None |
CD |
License required |
B-BROWN - The 1930s Brown Corpus |
1928 |
1934 |
PDE |
1,000,000 |
500 |
Written |
Tagging Parsing None Other |
On-site |
In preparation |
Buckeye Corpus |
1998 |
2000 |
PDE |
300,000 |
40 |
Spoken |
Other |
Download |
Free subscription |
CASE - Corpus of Academic Spoken English |
2012 |
|
PDE |
|
300 |
Spoken |
Tagging Other |
Online |
In preparation |
CC - The Coruña Corpus of English Scientific Writing |
1700 |
1900 |
LModE |
|
|
Written |
|
CD Download |
Open access |
CC: CETA - A Corpus of English Texts on Astronomy |
1700 |
1900 |
LModE |
409,909 |
42 |
Written |
Other |
CD Download |
Open access |
CC: CEPhiT - A Corpus of English Philosophy Texts |
1700 |
1900 |
LModE |
400,416 |
40 |
Written |
Other |
CD Download |
Open access |
CC: CHET - A Corpus of History English Texts |
1700 |
1900 |
LModE |
404,311 |
40 |
Written |
Other |
Download |
Open access |
CC: CELiST - A Corpus of English Life Sciences Texts |
1700 |
1900 |
LModE |
400,305 |
40 |
Written |
Other |
|
In preparation |
CC: Women Scientists - A Corpus of Women Scientists |
1700 |
1930 |
LModE
PDE |
|
40 |
Written |
|
|
In preparation |
CED - A Corpus of English Dialogues 1560-1760 |
1560 |
1760 |
EModE |
1,183,690 |
177 |
Written & Spoken |
Other |
CD Download |
License required |
CEEC - Corpus of Early English Correspondence |
1402 |
1800 |
ME EModE LModE |
5,100,000 |
12,000 |
Written |
Other |
On-site |
Free subscription |
CEEC - Corpus of Early English Correspondence / 1998 version |
1410 |
1681 |
ME
EModE |
2,597,795 |
5,961 |
Written |
Other |
On-site |
Free subscription |
CEECE - Corpus of Early English Correspondence Extension |
1653 |
1800 |
EModE LModE |
2,219,422 |
4,923 |
Written |
Other |
On-site |
Free subscription |
CEECE: TCEECE - Tagged Corpus of Early English Correspondence Extension |
1653 |
1800 |
EModE LModE |
2,219,422 |
4,923 |
Written |
Tagging Other |
On-site |
Free subscription |
CEECES - Corpus of Early English Correspondence Extension Sampler |
1653 |
1800 |
EModE LModE |
1,140,286 |
2,624 |
Written |
Tagging Other |
Download |
Open access |
CEECSU - Corpus of Early English Correspondence Supplement |
1402 |
1663 |
ME
EModE |
442,484 |
829 |
Written |
Other |
On-site |
Free subscription |
CEECS - Corpus of Early English Correspondence Sampler |
1418 |
1680 |
ME
EModE |
450,085 |
1,123 |
Written |
Other |
CD
Download |
Free subscription |
CEEC: PCEEC - Parsed Corpus of Early English Correspondence |
1410 |
1681 |
ME
EModE |
2,159,132 |
4,970 |
Written |
Tagging
Parsing
Other |
Download |
Free subscription |
CEEC: PCEEC2 - Parsed Corpus of Early English Correspondence, 2nd edition |
1410 |
1681 |
ME
EModE |
2,159,132 |
4,970 |
Written |
Parsing Other |
Download |
Open access |
CEEM - Corpus of Early English Medical Writing |
1375 |
1800 |
ME
EModE LModE |
4,500,000 |
1,164 |
Written |
Other |
CD |
Commercial |
CEEM: MEMT - Middle English Medical Texts |
1375 |
1500 |
ME |
495,322 |
86 |
Written |
Other |
CD |
Commercial |
CEEM: EMEMT - Early Modern English Medical Texts |
1500 |
1700 |
EModE |
2,000,000 |
450 |
Written |
Other |
CD |
Commercial |
CEEM: LMEMT- Late Modern English Medical Texts |
1700 |
1800 |
LModE |
2,000,000 |
628 |
Written |
Other |
CD |
Commercial |
CHELAR - Corpus of Historical English Law Reports 1535-1999 |
1535 |
1999 |
EModE LModE PDE |
463,009 |
369 |
Written |
Tagging None |
Download |
Open access |
CIE - A Corpus of Irish English 14th–20th c. |
14c |
present |
ME
EModE
LModE
PDE |
|
70 |
Written |
Other |
CD |
From compiler |
CLEP - Corpus of Late 18th c. Prose |
1761 |
1790 |
LModE |
300,000 |
1827 |
Written |
None |
Download |
Free subscription |
CLOB - A Brown Family Corpus of Written British English |
2008 |
2011 |
PDE |
1,000,000 |
500 |
Written |
|
Download |
Open access |
CLMETEV - The Corpus of Late Modern English Texts |
1710 |
1920 |
LModE |
15,000,000 |
176 |
Written |
None |
Download |
Free subscription |
CMEPV - Corpus of Middle English Prose and Verse |
|
|
ME |
|
62 |
Written |
|
Online |
Open access |
CMSW - Corpus of Modern Scottish Writing |
1700 |
1945 |
LModE |
5,500,000 |
62 |
Written |
|
Online |
Open access |
CNNE - Corpus of Nineteenth-century Newspaper English |
1830 |
1895 |
LModE |
320,000 |
200 |
Written |
None |
Onsite |
Not available |
COCA - Corpus of Contemporary American English |
1990 |
2009 |
PDE |
520,000,000 |
- |
Written & Spoken |
Tagging |
Online |
Free subscription |
CoCELD - Corpus of Contemporary English Legal Decisions, 1950–2021 |
1950 |
2021 |
PDE |
733,227 |
288 |
Written |
Tagging |
Download |
Free subscription |
CoER - Corpus of Early English Recipes |
1375 |
1900 |
ME EModE LModE |
1,500,000 |
150 |
Written |
|
|
In preparation |
COERP - Corpus of English Religious Prose |
1150 |
1800 |
ME
EModE
LModE |
|
|
Written |
Other |
- |
In preparation |
COHA - Corpus of Historical American English |
1810 |
2009 |
LModE PDE |
4,000,000 |
100,000 |
Written |
Tagging |
Online |
Open access |
COLMOBAENG - Corpus of Late Modern British and American English Prose |
2006 |
2007 |
PDE |
1,170,000 |
173 |
Written |
None |
Download |
Open access |
CoNE - Corpus of Narrative Etymologies |
1150 |
1325 |
EME |
|
|
Written |
Tagging Other |
Online |
Open access |
CONTE-pC - Corpus of Early Ontario English, pre-Confederation Section |
1776 |
1849 |
LModE |
125,000 |
|
Written |
|
|
From compiler |
CONTRAST-IT |
2011 |
2015 |
PDE |
300,000 |
|
Written |
Tagging |
Online |
Open access |
COOEE - Corpus of Oz Early English |
1788 |
1900 |
LModE |
2,000,000 |
1353 |
Written |
Other |
- |
Free subscription |
CoSiB - Corpus of Singaporean Blogs |
2006 |
2010 |
PDE |
200,000 |
100 |
Written |
|
|
From compiler |
CROWN - Crown Corpus |
2008 |
2011 |
PDE |
1,026,226 |
500 |
Written |
Tagging Parsing |
Online |
Open access |
CSC - Corpus of Scottish Correspondence |
1500 |
1715 |
EModE |
256,300 |
719 |
Written |
Other |
Online |
In preparation |
DCPSE - Diachronic Corpus of Present-Day Spoken English |
1958 |
1992 |
PDE |
800,000 |
280 |
Spoken |
Tagging Parsing Other |
CD |
Commercial |
DECTE - Diachronic Electronic Corpus of Tyneside English |
1960 |
2000 |
PDE |
804,266 |
99 |
Spoken |
|
Download DVD |
From compiler |
DOEC - Dictionary of Old English Corpus |
600 |
1150 |
OE |
4,000,000 |
3060 |
Written |
Other |
CD |
License required |
ELFA - English as a Lingua Franca in Academic Settings |
2001 |
2008 |
PDE |
1,010,834 |
165 |
Written & Spoken |
|
CD |
License required |
ENPC - The English-Norwegian Parallel Corpus |
1975 |
1995 |
PDE |
2,600,000 |
100 |
Written |
Tagging
Other |
On-site |
Free subscription |
FLOB - The Freiburg-Lancaster-Oslo/Bergen Corpus |
1992 |
1992 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
FRED - Freiburg Corpus of English Dialects |
1970 |
1999 |
PDE |
1,011,396 |
121 |
Spoken |
Other |
On-site
CD
|
Free subscription |
FROWN - The Freiburg-Brown Corpus |
1991 |
1991 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
Google Books - Google Books Corpora |
1500 |
2009 |
EModE LModE PDE |
2000,000,000,000 |
|
Written |
Tagging Other |
Online |
Open access |
HARES - Helsinki Corpus of Regional English Speech |
1970 |
1980 |
PDE |
|
|
Spoken |
Other |
Download |
License required |
HC - Helsinki Corpus |
730 |
1710 |
OE
ME
EModE |
1,572,800 |
450 |
Written |
Other |
CD
Download |
License required |
HCOS - Helsinki Corpus of Older Scots |
1450 |
1700 |
EModE |
834,200 |
71 |
Written |
Other |
CD |
License required |
HD - Helsinki Corpus of British English Dialects |
1970 |
1985 |
PDE |
1,008,641 |
187 |
Spoken |
Other |
On-site |
Free subscription |
HUM19UK - HUM19UK Corpus |
1800 |
1899 |
LModE |
13,000,000 |
100 |
Written |
Other |
Download |
Open access |
ICE - International Corpus of English |
|
|
|
|
|
Written & Spoken |
Tagging Other |
Download CD |
Free subscription |
ICE-GB: International Corpus of English - The British component |
1990 |
1993 |
PDE |
1,061,264 |
500 |
Written & Spoken |
Tagging Parsing
|
CD |
License required |
ICE-GBR: International Corpus of English - Gibraltar |
2000 |
1993 |
PDE |
1,000,000 |
|
Written & Spoken |
|
|
In preparation |
ICE-NIG: International Corpus of English - Nigeria |
2000 |
|
PDE |
1,000,000 |
902 |
Written & Spoken |
Tagging |
Download |
Open access |
ICE-SCO: International Corpus of English - Scotland |
2013 |
2016 |
PDE |
1,000,000 |
|
Written & Spoken |
Tagging |
Download |
In preparation |
ICoMEP - Innsbruck Corpus of Middle English Prose |
|
|
ME |
7,800,000 |
129 |
Written |
|
CD |
From compiler |
JSCC - The John Swales Conference Corpus |
2006 |
2006 |
PDE |
100,000 |
23 |
Spoken |
None |
Download |
Open access |
LAEME - A Linguistic Atlas of Early Middle English |
1150 |
1325 |
ME |
816,170 |
167 |
Written |
Other |
Online |
Open access |
eLALME - A Linguistic Atlas of Late Mediaeval English |
1150 |
1325 |
ME |
|
|
Written |
Other |
Online |
Open access |
LAMSAS - A Linguistic Atlas of the Middle and South Atlantic States |
1933 |
1974 |
PDE |
|
|
Spoken |
Other |
Online |
Open access |
LC - The Lampeter Corpus of Early Modern English Tracts |
1640 |
1740 |
EModE |
1,193,385 |
120 |
Written |
Other |
CD
Download |
License required |
LLC - The London-Lund Corpus of Spoken English |
1953 |
1987 |
PDE |
500,000 |
100 |
Spoken |
Other |
CD |
License required |
LOB - The Lancaster-Oslo/Bergen Corpus |
1961 |
1961 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
MCEESP - The Málaga Corpus of Early English Scientific Prose |
1350 |
1900 |
ME EModE LModE |
6,000,000 |
|
Written |
Tagging |
Online Download |
In preparation |
MCLMESP - The Málaga Corpus of Late Middle English Scientific Prose |
1350 |
1500 |
ME |
1,500,000 |
|
Written |
Tagging Other |
Download |
Open access |
MCEModESP - The Málaga Corpus of Early Modern English Scientific Prose |
1500 |
1700 |
EModE |
1,500,000 |
|
Written |
Tagging |
Online Download |
Free subscription |
MCLModESP - The Málaga Corpus of Late Modern English Scientific Prose |
1700 |
1900 |
LModE |
3,000,000 |
|
Written |
Tagging |
Online Download |
In preparation |
MEG-C - The Middle English Grammar Corpus |
1350 |
1500 |
ME |
450,000 |
320 |
Written |
Other |
Download |
Open access |
MICASE - Michigan Corpus of Academic Spoken English |
1997 |
2001 |
PDE |
1,800,000 |
152 |
Spoken |
Other |
CD
Download
Online |
Open access |
MICUSP - Michigan Corpus of Upper-level Student Papers |
2002 |
2009 |
PDE |
2,600,000 |
829 |
Written |
Other |
Online |
Open access |
MOECS - Corpus of Multilingual Opinion Essays by College Students |
2007 |
2016 |
PDE |
|
477 |
Written |
|
Download |
Free subscription |
NECTE - Newcastle Electronic Corpus of Tyneside English |
1969 |
1994 |
PDE |
|
62 |
Spoken |
Other |
Download
DVD |
Free subscription |
OBC - Old Bailey Corpus |
1720 |
1913 |
EModE
LModE |
14,000,000 |
|
Spoken |
Tagging |
Online |
Free subscription |
PPCEME - The Penn-Helsinki Parsed Corpus of Early Modern English |
1500 |
1710 |
EModE |
1,794,010 |
229 |
Written |
Tagging
Parsing
None |
CD |
License required |
PPCMBE - The Penn-Helsinki Parsed Corpus of Modern British English |
1700 |
1914 |
LModE PDE |
948,895 |
101 |
Written |
Tagging Parsing None |
CD |
License required |
PPCME2 - The Penn-Helsinki Parsed Corpus of Middle English, 2nd edition |
1150 |
1500 |
ME |
1,155,965 |
55 |
Written |
Tagging
Parsing
None |
CD |
License required |
PWEC - Pakistan Written English Corpus |
2020 |
2023 |
PDE |
7,586,110 |
4,158 |
Written |
None |
|
From compiler |
QHC - Quaker Historical Corpus |
1650 |
1699 |
EModE |
722,370 |
173 |
Written |
None |
Online |
Open access |
RCN1 - Rostock Newspaper Corpus |
1700 |
2000 |
LModE PDE |
600,000 |
|
Written |
|
On-site |
|
SC - Salamanca Corpus. Digital Archive of English Dialect Texts |
1500 |
1950 |
EModE
LModE |
6,115,267 |
|
Written |
Tagging |
Online |
|
SCEPA - Small Corpus of English Political Apologies |
1950 |
2017 |
PDE |
22,538 |
232 |
Written & Spoken |
Other |
Download |
Open access |
SCoCESLE - Small Corpus of Colombian English as a Second Language Essays |
2022 |
2023 |
PDE |
81,994 |
272 |
Written |
None |
Download |
Open access |
SCONE - Seville Corpus of Northern English |
600 |
1590 |
OE
ME |
|
|
Written |
Other |
Download |
Open access |
SCOTS - Scottish Corpus of Texts & Speech |
1945 |
2007 |
PDE |
4,000,000 |
1177 |
Written & Spoken |
Other
None |
Online |
Open access |
SCPS - Small Corpus of Political Speeches |
1789 |
2010 |
PDE |
655,479 |
239 |
Written & Spoken |
Tagging |
On-site |
License required |
TaCoCASE - Transatlantic Component of the Corpus of Academic Spoken English |
2016 |
2023 |
PDE |
140,003 |
15 |
Spoken |
Other |
Online Download |
Free subscription |
TIME - TIME corpus |
1923 |
2009 |
PDE |
100,000,000 |
275,000 |
Written |
Tagging |
Online |
Free subscription |
ViMELF - Corpus of Video-Mediated English as a Lingua Franca Conversations |
2012 |
2015 |
PDE |
152,472 |
20 |
Spoken |
Tagging Other None |
Download |
Free subscription |
VOICE - Vienna-Oxford International Corpus of English |
2000 |
2007 |
PDE |
1,023,043 |
151 |
Spoken |
Other |
Online |
Free subscription |
WestLabUSENET - Reduced redundancy USENET corpus |
2005 |
2011 |
PDE |
6,089,697,986 |
22,799,995 |
Written |
None |
Download |
Open access |
YCCQA - Yahoo-based Contrastive Corpus of Questions and Answers |
2006 |
2009 |
PDE |
29,400,000 |
665,000 |
Written |
Other |
Download |
Free subscription |
YCOE - The York-Toronto-Helsinki Parsed Corpus of Old English Prose |
- |
- |
OE |
1,500,000 |
100 |
Written |
Tagging |
Download |
Free subscription |
ZEN - Zurich English Newspaper corpus |
1661 |
1791 |
EModE
LModE |
1,600,000 |
349 |
Written |
Other |
Online
CD |
Free subscription |
Corpus Finder categories
Corpus |
Some corpora consist of subcorpora (CEEC, CEEM). In these cases both the entire corpus and the subcorpora have been listed; the subcorpora are indented. |
Start, End, Periods |
The period labelling follows roughly the categorisation below unless a particular period is specified in the name of the corpus. |
OE |
Old English c. -1300 |
ME |
Middle English c. 1300-1500 |
EModE |
Early Modern English c. 1500-1700 |
LModE |
Late Modern English c. 1700-1900 |
PDE |
Present Day English 1900- |
Word count, Text samples |
Left empty when the word count or number of text samples is unknown. |
Spoken/Written |
Shows whether the corpus material is from written sources, recorded speech or both. |
Annotation |
Tagging |
Part-of-speech annotation |
Parsing |
Syntactic annotation |
Other |
Annotation of, e.g., discursive features, text structure,
phonetic features, orthography, etc. |
None |
|
Format |
CD/DVD |
The corpus is distributed on a disc. |
Download |
The corpus can be downloaded from the internet. |
Online |
The corpus is accessible online without downloading. |
On-site |
The corpus can only be accessed locally. |
Availability |
Open access |
The corpus can be freely used by anyone. |
Free subscription |
The corpus is free to use but requires a subscription. |
Licence required |
A paid subscription is required. |
Commercial |
|
In preparation |
|
Not available |
The corpus is not available to external users for copyright reasons. |
Javascript for the Corpus Finder table by Max Guglielmi (http://tablefilter.free.fr/). |
|
|