Basic structure of the corpus

All four corpora belonging to the core of the ‘Brown family’, i.e. the 1960s Brown and LOB corpora and their 1990s updates Frown and F-LOB, follow the same structure of textual genres. All corpora of the ‘Brown family’ consist of 500 texts of about 2000 words each, giving a total of around one million words per corpus. The table below lists the genres included in the corpora.

Table 1. Text categories in the Brown family of matching 1-million-word corpora of written StE (Source: Manual of the POS-tagged ‘Brown’ corpora)

Genre group

Category

Content of category

No. of texts

Press (88)

A

Reportage

44

 

B

Editorial

27

 

C

Review

17

General Prose (206)

D

Religion

17

 

E

Skills, trades and hobbies

36

 

F

Popular lore

48

 

G

Belles lettres, biographies, essays

75

 

H

Miscellaneous

30

Learned (80)

J

Science

80

Fiction (126)

K

General fiction

29

 

L

Mystery and detective Fiction

24

 

M

Science fiction

6

 

N

Adventure and Western

29

 

P

Romance and love story

29

 

R

Humor

9

Total

 

 

500

Text categories in the Brown family of corpora.

Figure 1. Text categories in the Brown family of corpora.