The Lancaster-Oslo/Bergen Corpus (LOB Corpus)

The Lancaster-Oslo/Bergen Corpus (LOB Corpus) is a British English counterpart of the Brown Corpus. Like its American counterpart, it contains 500 texts of c. 2,000 words, distributed across 15 text categories, 9 informative and 6 imaginative. The LOB Corpus exists in two main versions: the original version and a POS-tagged version. The texts were selected by stratified random sampling; see the manual for the original version of the corpus. All the texts are written and were originally published in 1961.

Note this comment in the manual for the original version of the corpus:

The true “representativeness” of the present corpus arises from the deliberate attempt to include relevant categories and subcategories of texts rather than from blind statistical choice. Random sampling simply ensures that, within the stated guidelines, the selection of individual texts is free of the conscious or unconscious influence of personal taste or preference.

Project leader: Geoffrey Leech (project leader), Stig Johansson (project leader), Knut Hofland (head of computing), Roger Garside (head of computing, POS-tagged version). For information on other people taking part in the project, see the manuals.
Time of compilation: original version 1970–1978, POS-tagged version 1981–1986.
Size: app. 1 million words
Language: English
Number of texts/samples: 500
Period: 1961
Released: 1976 (original version), 1986 (POS-tagged version)
Funding (original version): Longman Group Limited, the British Academy, Department of British and American Studies, University of Oslo, Norwegian Research Council for Science and the Humanities, Norwegian Computing Centre for the Humanities
Funding (POS-tagged version): Social Science Research Council, Norwegian Research Council for Science and the Humanities, Norwegian Computing Centre for the Humanities

Reference line and Copyright

The LOB Corpus, original version (1970–1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), and Knut Hofland, University of Bergen (head of computing).

The LOB Corpus, POS-tagged version (1981–1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).

Manual

Original version: Johansson, Stig, Geoffrey Leech, and Helen Goodluck (1978), Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo.

http://khnt.hit.uib.no/icame/manuals/lob/INDEX.HTM

POS-tagged version: Johansson, Stig, Eric Atwell, Roger Garside, and Geoffrey Leech (1986), The Tagged LOB Corpus. Users' Manual. Bergen: Norwegian Computing Centre for the Humanities.

http://khnt.hit.uib.no/icame/manuals/lobman/INDEX.HTM

Errata

The few errata in the original published texts are reproduced without comment in the LOB Corpus. These errata, however, are listed in the Manual of Information under the individual entries of text samples.

Compilers

Geoffrey Leech (project leader), Stig Johansson (project leader), Knut Hofland (head of computing), Roger Garside (head of computing, POS-tagged version).

For information on other people taking part in the project, see the manuals.

Availability

Available for research; distribution and licence through ICAME and the Oxford Text Archive

Associated projects

The Brown Corpus
The Kolhapur Corpus of Indian English
The Australian Corpus of English (ACE)
The Wellington Corpus of Written New Zealand English
The Freiburg-LOB Corpus of British English (F-LOB)
The Freiburg-Brown Corpus of American English (FROWN)
The International Corpus of English (ICE)

 

 

CoRD Entry submitted on October 22, 2008 by Prof. Stig Johansson, Department of English Language, University of Oslo.
Information for the entry was edited by Prof. Stig Johansson and Prof. Geoffrey Leech.