British English 2006 (BE06)

The BE06 Corpus is a one million word corpus of published general written British English. It has the same sampling frame as the LOB and F-LOB corpora. This consists of 500 files of 2000 word samples taken from 15 genres of writing.

Eighty-two per cent of the texts were published between 2005 and 2007, while the remainder were published in 2003–2004 and early 2008. The median sampling point is 2006, hence the title BE06 (British English 2006).

All of the texts were taken from internet sources, although it was stipulated that they needed to have been published in paper form before or as well as being placed on the internet for inclusion in the corpus.

Project leader: Paul Baker, Lancaster University
Time of compilation: 2007–2008
Size: 1,010,996 words
Language: English
Number of texts/samples: 500
Period: 2003–2008
Project home page:

Reference lines and copyright

Due to copyright issues, there are no plans to make the corpus files fully available. However, the corpus has been placed on the CQP (Corpus Query Processor) system at Lancaster University and users can carry out concordances, get distribution information (and have access to collocation information). See and contact Andrew Hardie ( in order to obtain a username and password. Frequency lists additionally can be downloaded in a variety of formats from

Technical information

At CQPweb the corpus has been grammatically annotated with the CLAWS C7 tagset.

Associated projects

