Standardized versions of the Corpora of Early English Correspondence
Spelling variation in Early Modern English data poses considerable problems for the accuracy of corpus linguistic tools and methods developed primarily for Modern English. Personal letters in particular show extensive spelling variation, reflecting e.g. regional and social differences, but it is not uncommon for a single writer to exhibit internal variation and idiosyncratic spellings as well. In order to deal with the problems of spelling variation, we are standardizing the Corpora of Early English Correspondence using VARD 2 (Variant Detector), a program designed for dealing with spelling variation in Early Modern English texts automatically (see http://www.comp.lancs.ac.uk/~barona/vard2/). This will complement the CEEC family of corpora with a version of the texts that is better suited for methods such as keyword and cluster analysis.
|