Background

(Source: the ICAME London-Lund Corpus manual page.)

As the name implies, the London-Lund Corpus of Spoken English (LLC) derives from two projects. The first is the Survey of English Usage (SEU) at University College London, launched in 1959 by Randolph Quirk, who was succeeded as Director in 1983 by Sidney Greenbaum. The second project is the Survey of Spoken English (SSE), which was started by Jan Svartvik at Lund University in 1975 as a sister project of the London Survey.

The goal of the Survey of English Usage is to provide the resources for accurate descriptions of the grammar of adult educated speakers of English. For that purpose the major activity of the Survey has been the assembly and analysis of a corpus comprising samples of different types of spoken and written British English. The original target for the corpus of one million words has now been reached, and the corpus is therefore complete.

The SEU corpus contains 200 samples or 'texts', each consisting of 5000 words, for a total of one million words. The texts were collected over the last 30 years, half taken from spoken English and half from written English. The spoken English texts comprise both dialogue and monologue. The written English texts include not only printed and manuscript material but also examples of English read aloud, as in broadcast news and scripted speeches.

In 1975 the Survey of Spoken English was established at Lund. lts initial aim was to make available, in machine-readable form, the spoken material which by then had been collected and transcribed in London: 87 texts totalling sorne 435 000 words (see Svartvik et al 1982 for an account of the input procedures). The material was inserted in a reduced transcription and without grammatical analysis. Early in 1980 the first copies of the computerized London-Lund Corpus of Spoken English were distributed to interested scholars all over the world.

This original London-Lund Corpus of 87 texts (often referred to as LLC) has since been augmented by the remaining 13 spoken texts of the SEU corpus, which were processed at the Survey of English Usage in conformity with the system used in the original London-Lund Corpus. These 13 texts constitute a supplement (LLC:s) to the original computerized version. The complete London-Lund corpus (LLC:c) therefore consists of 100 spoken texts. In addition, all the written texts of the SEU corpus are now computerized, but these do not form part of the London-Lund Corpus and will not be distributed, though they can be consulted at the Survey of English Usage at University College London. Since LLC has been widely used in scholarly publications for the last decade, it is important to distinguish in future publications the original version from the supplement and from the complete version that incorporates the supplement. In order to avoid misunderstanding we recommend using suffixes for all three thus:

LLC:o the original corpus (87 texts)
LLC:s the supplement (13 texts) to the original corpus
LLC:c the complete corpus (100 texts)