The Freiburg-Brown corpus of American English (Frown)

The Freiburg update of the Brown corpus (Frown) is part of the ‘Brown family’ of corpora. Work on the compilation of Frown and its counterpart, the Freiburg-LOB corpus of British English (F-LOB), began in 1991. Both corpora were intended to match the Brown and LOB corpora as closely as possible in size and composition, with the only difference that they should represent the language of the early 1990s.

Like the original Brown and LOB corpora, Frown contains 500 texts of around 2000 words each, distributed across 15 text categories, 9 informative and 6 imaginative.

The texts were not obtained by random sampling but were selected carefully to match the Brown corpus as closely as possible. The main aim was to achieve a close comparability with Brown rather than some kind of general statistical representativeness of printing output in the United States, in order to provide linguists with an empirical basis to study language change in progress. There are two versions of the Frown corpus, the original version and a POS-tagged version – produced jointly with Geoffrey Leech (Lancaster) and Nick Smith (then Lancaster, now Leicester).

Project leader: Christian Mair, Albert-Ludwigs-Universität Freiburg
Time of compilation: 1992–1996
Size: approx. 1 million words
Language: English
Number of texts/samples: 500
Period: 1992
Released: 1999 (original version), 2007 (POS-tagged version)
Funding: Original (plain-text) version: DFG (German Research Foundation) Sonderforschungsbereich (special research group) 321 ‘Orality and Literacy’ from 1994 to 1996

POS-tagged version: Research at Lancaster by Geoffrey Leech and Nicholas Smith was supported by grants from the Arts and Humanities Research Board, the British Academy, and the Leverhulme Trust. Freiburg’s work on manual post-editing was assisted by a grant from the DFG.

Reference lines and copyright

The Freiburg-Brown Corpus (‘Frown’) (original version) compiled by Christian Mair, Albert-Ludwigs-Universität Freiburg

The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster

Manual

Original (plain-text) Version: Hundt, Marianne, Andrea Sand, and Paul Skandera. 1999. Manual of Information to accompany The Freiburg – Brown Corpus of American English (‘Frown’). Freiburg: Department of English. Albert-Ludwigs-Universität Freiburg. http://khnt.aksis.uib.no/icame/manuals/frown/INDEX.HTM.

POS-Tagged Version: Hinrichs, Lars, Nicholas Smith, and Birgit Waibel. 2007. The part-of-speech-tagged ‘Brown’ corpora: A manual of information, including pointers for successful use. Freiburg: Department of English. Albert-Ludwigs-Universität Freiburg.

Compilers

Christian Mair (project leader)

The following people were involved in the process of selecting and typing the text-extracts and/or proofreading: Birgit Felleisen, Heike Fiedler, Elke Frings, Elke Gebhard, Dorothee Graf, Ulrike Günther, Marianne Hundt, Matthias Kaufmann, Manfred Krug, Christoph Lindner, Isolde Mattmüller-Ofori, Nadja Nesselhauf, Christine Oesterlee, Andrea Sand, Paul Skandera, and Heike Schnitzler. Heike Fiedler helped in the proofreading and editing of the manual

Christoph Lindner wrote the programme used to assign the category references and line-numbers to the ASCII-texts, and Heike Peper-Ludwig helped in computer-related emergencies.

The following people worked at different stages on the compilation and post-editing of POS-tagged FLOB and Frown: Franziska Becker, Lucas Champollion, Septimius Fericean, Heike Fiedler, Ulf Gerdelmann, Lars Hinrichs, Marianne Hundt, Matthias Kaufmann, Tobias Maier, Christian Mair, Michael Percillier, Stefanie Rapp, Andrea Sand, Silke Scheible, Birgit Waibel, Antonia Walker and Lisa-Maria Wild.

Geoffrey Leech and Nicholas Smith at Lancaster University worked on the automatic tagging of the Brown corpora and Mike Pacey contributed greatly to developing the Template Tagger software.

Availability

Available for research; distribution and licence through ICAME.

Associated projects

The Brown Corpus of Present-Day Edited American English
The Lancaster-Oslo-Bergen Corpus of British English (LOB)
The Freiburg update of the LOB corpus (F-LOB)
The Kolhapur Corpus of Indian English (Shastri 1988)
The Australian Corpus of English (Collins & Peters 1988)
The Wellington Corpus of New Zealand English (Bauer 1993)