Basic structure of the corpus

The table below shows the main composition of the LOB Corpus and its American counterpart. The matching between the two corpora is in terms of the general categories only. There is no one-to-one correspondence between samples, although the general arrangement of subcategories has been followed wherever possible. For more detail, see the manual for the original version of the LOB Corpus.

Text categories		Number of samples in each category
Text categories		Brown Corpus	LOB Corpus
A	Press: reportage	44	44
B	Press: editorial	27	27
C	Press: reviews	17	17
D	Religion	17	17
E	Skills, trades and hobbies	36	38
F	Popular lore	48	44
G	Belles lettres, biography, essays	75	77
H	Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ)	30	30
J	Learned and scientific writings	80	80
K	General fiction	29	29
L	Mystery and detective fiction	24	24
M	Science fiction	6	6
N	Adventure and western fiction	29	29
P	Romance and love story	29	29
R	Humour	9	9
Total		500	500

Parameters and other coding

A detailed coding scheme was developed for the original version of the LOB Corpus; see the manual.

The tagged LOB Corpus is described in detail in the manual. Each word is accompanied by a word-class tag, assigned through a combination of automatic tagging programs and manual pre- and post-editing. There is no syntactic bracketing. The tagging is broadly comparable with that developed for the Brown Corpus, but more distinctions are made. Differences between the two tag sets are discussed in the manual for the tagged LOB Corpus.

Genres

Informative prose (non-fiction) and imaginative prose (fiction). See the information above on the basic structure of the corpus and the detailed account in the manual for the original version of the corpus.

The Lancaster-Oslo/Bergen Corpus

Basic structure of the corpus

Parameters and other coding

Genres