HUM19UK Corpus
HUM19UK is the Huddersfield, Utrecht, Middelburg corpus of 19th century British fiction. It was created between 2016–2019 as a collaborative project between the University of Huddersfield (UK), Utrecht University (the Netherlands) and University College Roosevelt in Middelburg (the Netherlands). The corpus contains 100 complete novels written by 100 authors (50 male/50 female) over 100 years, with roughly 10 novels per decade. It totals 13 million words.
Time of compilation: 2016–2019
Size: 13 million words
Language: English
Number of texts/samples: 100 complete novels
Period: 1800–1899
Released: 2019
Project home page: https://www.linguisticsathuddersfield.com/hum19uk-corpus
Reference line and copyright
Everyone is allowed to use the corpus. All texts in the corpus are out of copyright based on copyright laws and were extracted from Project Gutenberg, Celebration of Women Writers, Victorian Women Writers Project, Chawton House and Public Library UK websites.
Manual
https://www.linguisticsathuddersfield.com/hum19uk-corpus
Compilers
Dr. Brian Walker, Fransina Stradling, Prof. Dan McIntyre, Elliott Land, Dr. Hazel Price (University of Huddersfield, UK)
Prof. Michael Burke (University College Roosevelt, Netherlands/University of Utrecht, Netherlands)
Availability
Open access. Freely available for download at
https://www.linguisticsathuddersfield.com/hum19uk-corpus
Technical information
The published version of the HUM19UK corpus contains machine-readable versions (.txt format) of novels that have been cleaned and annotated. The file name of each corpus text is its year of publication.
CoRD Entry submitted on November 29, 2019 by Fransina Stradling.
|