Basic structure of the corpus

SCOTS is made up of written (80%) and spoken (20%) texts in a variety of genres of Scottish English and Scots (and a small amount of Scottish Gaelic) from the period 1945–present. The corpus may be browsed and searched through integrated analysis software.

Parameters

Each document is accompanied by hundreds of fields of metadata, which can be viewed or can form the basis of a query through the integrated search facility. The complete corpus or a selection of documents can be downloaded, as plain text or XML alongside the metadata.

Sociolinguistic coverage

SCOTS is an opportunistic rather than a strictly balanced corpus, primarily owing to reasons of text availability and copyright permissions. However, it achieves considerable sociolinguistic coverage, along such parameters as speaker/writer age, birthplace, occupation, gender and educational level. Each text is accompanied by detailed metadata about the text and author/participant. All of these fields can form the basis of a search. Geographical data can be visualised through an integrated, interactive map function.

Background and history

For information on the background to the Scottish Corpus of Texts & Speech, please see the project website, particularly http://www.scottishcorpus.ac.uk/about/background/.

Bibliography

A regularly updated list of publications and conference papers can be found at: http://www.scottishcorpus.ac.uk/about/publications/

Errata

The online corpus is corrected where necessary. We would be pleased to be notified of errors, and a bug report form for this purpose can be found at http://www.scottishcorpus.ac.uk/contact/bugreport/

Editors of text

Wendy Anderson, John Corbett, Christian Kay, Dave Beavan.