Helsinki Corpus of British English Dialects

The Helsinki Corpus of British English Dialects (HD) is a collection of orthographically transcribed audio recorded speech, mainly from East Anglia and the South-West, with a minor collection from Lancashire. The recordings were made in the 1970s and the 1980s by Finnish postgraduates.

The aim of the corpus is to provide material for linguistic research in the fields of dialectology, sociolinguistics, discourse analysis, morphology, syntax and phonology. The corpus also provides material for non-linguistic, multidisciplinary research, such as ethnography of communication, local habits and history.

Project leaders: Ossi Ihalainen 1984–1993, Kirsti Peitsara 1997–2006, Anna-Liisa Vasko 2007 onwards.
Fieldwork supervisors: In the 1970s, Tauno F. Mustanoja (Helsinki) in cooperation with Harold Orton (Leeds). Ossi Ihalainen supervised the fieldwork done in the late 1980s.
Size: Altogether 187 files consisting of 1.008.641 words.
Time periods: 1970s and 1980s (CAM, DEV, ELY, SOM, SUF), 1980s (ESS, LAN).
Status: First stage completed in 2006, with corpus material available for research. Second stage is ongoing, with plans of extending the corpus material with previously unpublished corpus data.
Corpus data: The primary data are the audio recordings of spoken dialect speech.
Funding: Finnish Cultural Foundation; The Academy of Finland, The University of Helsinki.
Language: English, rural (Cambridgeshire, Devon, Isle of Ely, Somerset, Suffolk) and urban (Essex, Lancashire).

Reference line and Copyright

The Helsinki Corpus of British English Dialects (2006). Department of Modern Languages, University of Helsinki. All the material consists of interviews made by the fieldworkers mentioned below who have full copyright for the material. For permission to use the files, contact Anna-Liisa Vasko ( or Kirsti Peitsara (


Kirsti Peitsara and Anna-Liisa Vasko.


Cambridgeshire proper (CAM) - Anna-Liisa Ojanen (Vasko)
Devon (DEV) - Ossi Stigell
Essex & Lancashire (ESS, LAN) - Riitta Kerman
Isle of Ely (ELY) - Irmeli Tammivaara-Balaam
Somerset (SOM) - Ossi Ihalainen
Suffolk (SUF) - Leena Pasanen

Graduate and postgraduate research assistants

Maarit Alanko, Tuula Chezek, Sanna Huttunen, Minna Korhonen, Jaana Suviniitty, Eero Timoskainen.


Peitsara, Kirsti. Manual to the Dialectal Part of the Helsinki Corpus of British English Dialects. Available for those working in VARIENG.

File format

The coding system is based on the set of ASCII codes (96 printable characters). The names of the 187 files follow MS-DOS conventions, limiting available characters to eight. Each file name begins with the letters DI (for dialect) plus three first letters of the county the samples are taken from followed by an ordinal number (e.g. DIDEV01, DISOM28, DIELY50).


On-site access; available (for people working in VARIENG) in WordCruncher. For permission to use the files, contact Anna-Liisa Vasko ( or Kirsti Peitsara (

CoRD Entry submitted on March 14, 2008 by Dr. Anna-Liisa Vasko and Simo Ahava, Department of English, University of Helsinki.