Background

The corpus was compiled in order to allow diachronic comparisons to be made with the earlier LOB (1961) and F-LOB (1991) corpora, and emulates the same structure as these earlier corpora. The BLOB-1931 corpus also follows these principles.

It was also a feasibility study, to see if it would be possible to collect all of the text from online sources (even though they had been published in paper form previously). The corpus took about 7 working days to compile.

"Text collection for the corpus was completed in May 2008. Exact records were not kept regarding the length of time it took to build the corpus, although it is estimated that it took approximately ten minutes to locate each text on the internet, copy it to a text file and make a log of its title, the author, date published, word count, website address and whether the sample was taken from the beginning, middle or end of the text. With 500 texts, this equals approximately 5,000 minutes, or 83 hours, or twelve working days. With future corpus building projects of this size, it would not be unfeasible to ask a class of, say, 60 students to locate, save and produce logs for eight or nine texts each. The resulting files could then be incorporated into a new million word reference corpus, as well as giving students the experience of the corpus building process. Not having worked on building a reference corpus before, I found it to be an invaluable experience, particularly in that it raised a number of questions regarding the way that corpus users extrapolate results from data. First-hand knowledge of the decisions that went into the creation of the BE06 was a good preparation against over-interpretation of results once its analysis began."
  Baker, P. (2009) 'The BE06 Corpus of British English and recent language change.' International Journal of Corpus Linguistics. 14:3 316.