Basic structure of the corpus
The table below shows the main composition of the LOB Corpus and its American counterpart. The matching between the two corpora is in terms of the general categories only. There is no one-to-one correspondence between samples, although the general arrangement of subcategories has been followed wherever possible. For more detail, see the manual for the original version of the LOB Corpus.
Text categories |
Number of samples in each category |
Brown
Corpus |
LOB
Corpus |
A |
Press: reportage |
44 |
44 |
B |
Press: editorial |
27 |
27 |
C |
Press: reviews |
17 |
17 |
D |
Religion |
17 |
17 |
E |
Skills, trades and hobbies |
36 |
38 |
F |
Popular lore |
48 |
44 |
G |
Belles lettres, biography, essays |
75 |
77 |
H |
Miscellaneous (government documents, foundation reports, industry reports, college catalogue, industry house organ) |
30 |
30 |
J |
Learned and scientific writings |
80 |
80 |
K |
General fiction |
29 |
29 |
L |
Mystery and detective fiction |
24 |
24 |
M |
Science fiction |
6 |
6 |
N |
Adventure and western fiction |
29 |
29 |
P |
Romance and love story |
29 |
29 |
R |
Humour |
9 |
9 |
Total |
|
500 |
500 |
Parameters and other coding
A detailed coding scheme was developed for the original version of the LOB Corpus; see the manual.
The tagged LOB Corpus is described in detail in the manual. Each word is accompanied by a word-class tag, assigned through a combination of automatic tagging programs and manual pre- and post-editing. There is no syntactic bracketing. The tagging is broadly comparable with that developed for the Brown Corpus, but more distinctions are made. Differences between the two tag sets are discussed in the manual for the tagged LOB Corpus.
Genres
Informative prose (non-fiction) and imaginative prose (fiction). See the information above on the basic structure of the corpus and the detailed account in the manual for the original version of the corpus.
|