The Standard Corpus of Present-Day Edited American English (the Brown Corpus)

The Brown Corpus was the first computer-readable general corpus of texts prepared for linguistic research on modern English. It was compiled by W. Nelson Francis and Henry Kučera at Brown University in the 1960s and contains of over 1 million words (500 samples of 2000+ words each) of running text of edited English prose printed in the United States during the calendar year 1961. There are six versions of the corpus available: the original Form A, Form B from which punctuation codes have been omitted, the tagged Form C, Bergen Forms I & II and the Brown MARC Form.

The Brown Corpus has inspired a whole family of corpora, including the Lancaster-Oslo/Bergen Corpus (LOB), Brown's British English counterpart, as well as Frown and FLOB, the 1990s equivalents of Brown and LOB respectively.

Project leaders: W. Nelson Francis and Henry Kučera
Time of compilation: 1963–64 (original version)
Size: approx. 1 million words
Language: American English
Number of texts/samples: 500 samples of 2000+ words each
Period: 1961
Released: 1964 (original version)
Funding: Original version: the Cooperative Research Program of the U.S. Office of Education & Brown University.

Reference line and Copyright

A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.

Manual

Francis, W. N. and H. Kučera. 1964. Manual of Information to accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers. Providence, Rhode Island: Department of Linguistics, Brown University. Revised 1971. Revised and amplified 1979.

Available online: http://icame.uib.no/brown/bcm.html

Compilers

Project leaders: W. Nelson Francis & Henry Kučera

Workers on the original corpus: Anne Robb Taylor, Loretta Felice, Mary Lois Marckworth, Henry Hall Peyton, Jr., Robert Staudte, Jr.

Originators of the tagging system: Gerald M. Rubin and Barbara Greene Levine

Subsequent workers on the tagging: Patricia Strauss and Sandra Pierce Brenckle

Programming: Andrew Mackie

Availability

Available for research; distribution and licence through ICAME.

Associated projects

The Lancaster-Oslo/Bergen Corpus (LOB Corpus)
The Kolhapur Corpus of Indian English
The Australian Corpus of English (ACE)
The Wellington Corpus of Written New Zealand English
The Freiburg-LOB Corpus of British English (FLOB)
The Freiburg-Brown Corpus of American English (FROWN)
The International Corpus of English (ICE)