Old Bailey Corpus (OBC)

(Entry based on project homepage.)

Project leaders: Magnus Huber
Size: 14 million spoken words
Language: EModE, LModE
Period: 1720–1913
Project home page: http://www1.uni-giessen.de/oldbaileycorpus/
Funding: the German Science Foundation (DFG, HU 884/6-1, HU 884/6-2)

The Old Bailey Corpus is based on the Proceedings of the Old Bailey and documents spoken English from 1720 to 1913.

Turning the digitalized Proceedings into the linguistic Old Bailey Corpus consisted of three main steps:

  • localization and tagging of direct speech in the Proceedings with the help of a PERL script (this identified ca. 113 million words of spoken English),
  • part-of speech-tagging of the Proceedings using the CLAWS 7 tagset, and
  • compiling the Old Bailey Corpus: a balanced subset of the Proceedings with detailled sociolinguistic annotation of every utterance, based on sociobiographical speaker data found in the context of the trials (407 Proceedings, ca. 318,000 speech events, ca. 14 million spoken words, ca. 750,000 spoken words/decade).

The Old Bailey Corpus has detailled sociobiographic, pragmatic and textual annotation at the utterance level:

  • sociobiographical speaker information: gender, age, occupation (according to the Historical International Standard Classification of Occupations, HISCO), social class (according to HISCLASS, a social class scheme based on HISCO).
  • pragmatic information: speaker role in the courtroom: defendant, victim, lawyer, witness ...
  • textual information: scribe, printer and publisher of the Proceeding.

Reference line and copyright

Huber, Magnus; Nissel, Magnus; Maiwald, Patrick; Widlitzki, Bianca. 2012. The Old Bailey Corpus. Spoken English in the 18th and 19th centuries. www1.uni-giessen.de/oldbaileycorpus, [date of access].

Users who wish to cite material from the Old Bailey Corpus Online Website in publications should provide the URL (www1.uni-giessen.de/oldbaileycorpus) and the date on which the website was consulted. To cite concordance material obtained by searching a corpus in the Old Bailey Corpus suite (online or offline), include the version of the corpus used as well as the trial ID provided in the online concordances or in the <speech> or <trial> tags. For more information, see the Citation guide.

All material is made available free of charge for individual, non-commercial use only. This applies to all material on the OBC website, the corpora available for download or material obtained by searching the corpora (online or offline). The aforementioned material or the corpora may not be distributed to third parties. Commercial exploitation of the speech tags and related attributes is prohibited without license from the Old Bailey Corpus Project. Commercial exploitation of the text and the other XML tags is prohibited without licence from the Open University, University of Hertfordshire and University of Sheffield. Copyright in the design and content of the Old Bailey Corpus Online webpages is owned by the Old Bailey Corpus Project.

Compilers

Availability

Available online on the project homepage, requires free registration.