Basic structure of the corpus

FRED-S spans a subset of FRED texts not subject to copyright restrictions. More specifically, FRED-S covers:

1,011,396 running words
c. 123 hours of recorded speech
121 interviews
144 dialect speakers
57 different locations, in 18 different counties, in 5 major dialect areas:

the Southwest and Southeast of England, the Midlands, the North of England, and the Scottish Lowlands

(the full version covers 9 major dialect areas, including also the Scottish Highlands, the Hebrides, Wales and the Isle of Man; 43 different counties; c. 300 hours of recorded speech; 372 interviews with 431 speakers)

The recordings

All conversations in FRED-S were recorded between 1970 and 2000, the majority in the 1970s and 1980s (see table below). The recording date and/or recording decade for each text are specified in the text header as <RECDAT year> and <RECDEC 3 digits decade reference>. Except for a few texts where the tape was interrupted or the recording stops before the actual end of the conversation, most recordings are full-length interviews which last between 30 and 90 minutes. The longest interview is KEN_003 with almost 4.5 hours and roughly 47,000 words.

Table 1. Text distribution by recording date, FRED-S.

recording date	number of texts	% of textual material
1970 - 1979	47	42.2%
1980 - 1989	56	43.9%
1990 - 1999	15	10.3%
unknown	3	3.6%
Total	121	100.0%

Geographic coverage - dialect areas, counties and locations

FRED can be used to investigate single regional varieties as well as for the cross-regional comparison of linguistic features. On the uppermost geographic level, the texts are divided into 5 major dialect areas; for a more fine-grained analysis, there is also information on the county and location where each interview was recorded. The dialect area, county and location for each text are specified in the text header as <area value>, <county value> and <location value>. The county of a text can also be inferred from the file name, which starts with a three-letter 'Chapman county code' followed by underscore and a running number (DEV_001, DEV_002, DEV_003, etc. would all be texts from Devon; see www.genuki.org.uk for the Chapman county codes).

Tables 2 and 3 show a breakdown by dialect area and county of the amount of textual material in FRED-S (excluding interviewer utterances). The dialect area, county and location of each text are also specified in both the FRED-S text list (table 4) and the FRED-S speaker list (table 5) below.

Table 2. Text distribution by dialect area, FRED-S.

dialect area	number of texts	running words	% of textual material
Southwest (SW)	38	264,863	26.2%
Southeast (SE)	17	260,643	25.8%
Midlands (Mid)	16	152,535	15.1%
North (N)	30	266,955	26.4%
Scottish Lowlands (ScL)	20	66,400	6.6%
Total	121	1,011,396	100.0%

Table 3. Text distribution by county, FRED-S.

county	Chapman county code	dialect area	running words	% of textual material
Cornwall	CON	SW	26,535	2.6%
Devon	DEV	SW	79,870	7.9%
Oxfordshire	OXF	SW	13,801	1.4%
Somerset	SOM	SW	69,321	6.9%
Wiltshire	WIL	SW	75,336	7.4%
Kent	KEN	SE	155,192	15.3%
London	LND	SE	74,856	7.4%
Middlesex	MDX	SE	30,595	3.0%
Leicestershire	LEI	Mid	2,341	0.2%
Nottinghamshire	NTT	Mid	150,194	14.9%
Durham	DUR	N	26,507	2.6%
Lancashire	LAN	N	139,845	13.8%
Northumberland	NBL	N	27,777	2.7%
Westmorland	WES	N	21,304	2.1%
Yorkshire	YKS	N	51,522	5.1%
East Lothian	ELN	SCL	28,985	2.9%
Midlothian	MLN	SCL	21,068	2.1%
West Lothian	WLN	SCL	16,347	1.6%

FRED-S areal coverage: locations of samples in the corpus

Figure 1. FRED-S areal coverage: locations of samples in the corpus.

Genre(s) covered by the corpus

FRED is a spoken-language corpus (audio recordings with orthographic transcripts). It consists of oral history interviews with native speakers of BrE; the conversations are spontaneous and informal.

Freiburg Corpus of English Dialects

Basic structure of the corpus

The recordings

Geographic coverage - dialect areas, counties and locations

Genre(s) covered by the corpus