Basic structure of the corpus

The table below shows the main composition of the LOB Corpus and its American counterpart. The matching between the two corpora is in terms of the general categories only. There is no one-to-one correspondence between samples, although the general arrangement of subcategories has been followed wherever possible. For more detail, see the manual for the original version of the LOB Corpus.

Text categories

Number of samples in each category

Brown
Corpus

LOB
Corpus

A

Press: reportage

44

44

B

Press: editorial

27

27

C

Press: reviews

17

17

D

Religion

17

17

E

Skills, trades and hobbies

36

38

F

Popular lore

48

44

G

Belles lettres, biography, essays

75

77

H

Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ)

30

30

J

Learned and scientific writings

80

80

K

General fiction

29

29

L

Mystery and detective fiction

24

24

M

Science fiction

6

6

N

Adventure and western fiction

29

29

P

Romance and love story

29

29

R

Humour

9

9

Total

 

500

500

Parameters and other coding

A detailed coding scheme was developed for the original version of the LOB Corpus; see the manual.

The tagged LOB Corpus is described in detail in the manual. Each word is accompanied by a word-class tag, assigned through a combination of automatic tagging programs and manual pre- and post-editing. There is no syntactic bracketing. The tagging is broadly comparable with that developed for the Brown Corpus, but more distinctions are made. Differences between the two tag sets are discussed in the manual for the tagged LOB Corpus.

Genres

Informative prose (non-fiction) and imaginative prose (fiction). See the information above on the basic structure of the corpus and the detailed account in the manual for the original version of the corpus.