Background and history
CEEC history and compilers
Just as there is no society without language, there is no language corpus without the society of those who have compiled it. For various reasons, historical corpora often have a long compilation history. The beginnings of the Corpus of Early English Correspondence go back to the early 1990s. That is when, encouraged by their work on the Early Modern English section of the Helsinki Corpus, Terttu Nevalainen and Helena Raumolin-Brunberg had the idea of launching a joint project on English historical sociolinguistics.
Corpus queries
An endless number of practical questions were discussed at the frequent meetings of the project team. From early on, we realized how important it was to use not only present-day sociolinguistic studies but also research on social history. Terttu and Helena were particularly grateful to Dr. Keith Wrightson, then at Jesus College, Cambridge, for his encouraging feedback on the exploratory paper they published on the sociolinguistic background of the Early Modern English section of the Helsinki Corpus in Neuphilologische Mitteilungen in 1989. With this encouragement in mind, the historical sociolinguistics project appeared far less daunting.
We were excited about the prospect of studying how linguistic changes spread across the various sections of the language community in real time. Having only general histories of English and some work on genre variation to go by, we were also slightly anxious. A basic question was how long a change in progress would take to be completed, if it was completed at all. Would our data provide us with enough material for tracing these processes? And would it indeed be possible to use personal letters to trace the ways in which different social groups participated in these processes and see how the incoming forms diffused throughout the country?
Our work began with a grant from the Academy of Finland back in 1993. It was awarded to us for the study of the social mechanisms of language change - as we then called them - in Renaissance English using an electronic corpus of early English correspondence as the source for our data. The corpus thus had to be compiled before any empirical research could be tackled, and this compilation naturally became our main concern at the beginning of the project. The first person we employed as a research assistant was Arja Nurmi. Minna Palander-Collin joined the team next, but soon left us for a few years of PhD work in Oxford. Kirsi Heikkonen stepped in and gave us a helping hand during the initial stages of the project.
The funding granted to the project by the University of Helsinki in 1996 allowed us to hire Minna Aunio (now Nevala) to join the corpus team, and, together with Jukka Keränen, she helped us to process a large part of what we call the 1998 version of the Corpus of Early English Correspondence, or CEEC (pronounced 'seek') for short. It covers the period from about 1410 to 1680 and consists of 2.7 million words. A half-million-word sampler version of the corpus (CEECS) came out in 1999, and a tagged and parsed version (PCEEC) in 2006.
The tagging and parsing of the corpus was a collaboration between Helsinki and a team of scholars at the University of York, Dr Ann Taylor, Professor Susan Pintzuk and Professor Anthony Warner. Part-of-speech tagging was added to the corpus by Arja Nurmi in Helsinki, and it was parsed by Ann Taylor in York.
When the VARIENG Research Unit received National Centre of Excellence status in 2000, work on the CEEC was resumed. The team gained two new members, Mikko Laitinen and Anni Vuorinen (now Sairio), who began to work on the new subproject of extending the CEEC into the 18th century. Samuli Kaislaniemi joined the CEEC team as a research assistant in 2003, working on the CEEC Extension and Supplement and its letter and sender databases. Tuuli Tahko, Eero Timoskainen and Teo Juvonen also served on the project for shorter periods of time. From 2006 to 2008, Tanja Säily assisted the CEEC project, continuing work on the Extension and the sender and letter databases of the corpus; she also saw through the CEECer software project, begun in 2006. Mikko Hakala was hired as a research assistant in 2009.
The aim of the CEEC team is to release the entire corpus, "CEEC-400", by the end of the VARIENG funding period in 2011. New collaborative projects to supplement the corpus, such as a 19th-century extension of the CEEC being compiled by Professor Lieselotte Anderwald (University of Kiel), are also currently under way. The work of corpus compilers is never done!
Photo gallery
|
The CEEC team in 1997. From left to right: Anne Virolainen, Minna Nevala, Arja Nurmi, Helena Raumolin-Brunberg, Minna Palander-Collin and Terttu Nevalainen, with Jukka Keränen sitting at the computer. |
|
Another picture of the CEEC team: Jukka Keränen, Anne Virolainen, Minna Nevala, Terttu Nevalainen, Helena Raumolin-Brunberg, Minna Palander-Collin and Arja Nurmi. |
|
Team leader Terttu in front of a historical map of England. |
|
A close-up shot of the map. Each numbered pin represents a letter collection in the CEEC. The pins were hand painted by Jukka Keränen. |
|
The sheep was the symbol of the English wool merchants, such as our informant Otwell Johnson (c.1518-1551). |
|
The sheep also became the unofficial symbol of the CEEC team. |
The black and white pictures were taken by Ari Aalto in the CEEC project room at Fabianinkatu on 27 November 1997, while the colour photos were taken by Tanja Säily in the CEEC project room at Vironkatu on 11 November 2008.
Corpus of Early English Correspondence (CEEC) on varhaisista englanninkielisistä kirjeistä koostuva korpus eli elektroninen tekstikokoelma, joka on suunniteltu historiallisen sosiolingvistiikan tarpeisiin. Kirjeaineistosta kerrotaan enemmän seuraavissa yleistajuisissa lehtiartikkeleissa. Tarkemmat tiedot korpuksesta löytyvät verkkosivuston englanninkielisestä osasta.
|