Basic structure
COOEE was compiled, designed and edited in the course of the years 1995–2002 by the author. It started from a much smaller body of mainly Irish-Australian letters which formed the basis of the master's thesis Early Australian Letters (Fritz 1996).
Today, the total number of words collected is more than ten million. Due to the design restrictions reported on above, COOEE selects two from these ten million words. This leaves about eight million words as possible reference material. The latter are looked at in more detail later in this chapter.
The sources in COOEE are of very uneven length ranging from diary excerpts to book chapters. Therefore the number of words in a category gives a much clearer account of the available material than the number of sources does. For this reason the number of words and not the number of sources in a given category will mostly be used in the description of COOEE.
The corpus is divided into four time periods (1788–1825, 1826–1850, 1851–1875 and 1876–1900) each holding about 500,000 words.
Parameters & coding
Each text received a heading which states its Source Identification Number (SIN) and provides data about the author and the source. If a text ran through more than one page, the page break was indicated by the new page number in square brackets. SINs were assigned chronologically. Every SIN starts with a number between 1 and 4 (for the period the document was written in) and then, after a hyphen, has a three digit number for further identification. In some cases certain texts had to be removed and/or replaced after the SINs had been assigned. For practical reasons these SINs were not re-allocated. Therefore there are some gaps in the serial SINs. Whenever a quote from COOEE appears in the text, the SIN is given in pointed brackets, e.g. <1-093>. All the SINs and the data about every author and text are given in the appendix.
Registers
Four registers were defined for COOEE: the Speech-based Register (SB), the Private Written Register (PrW), the Public Written Register (PcW) and the register of Government English (GE). In every Period 1-4 there is a similar number of words in the different registers.
The PcW Register dominates the corpus (803,115 words), and rightly so, since these writings were most widely distributed and certainly made up the lion's share of Australia's linguistic scene as manifested in written/printed texts. The text types in this register are Memoirs (MM), Newspapers & Broadsides (NB), Narratives (NV), Official Correspondence (OC), Reports (RP) and Verse (VE).
Next comes the PrW Register (700,891 words). It represents the thousands of letters and diaries in which almost everybody confided one's private joys and sorrows. The text types are Diaries (DI) and Personal Correspondence (PC).
The SB Register is comparatively small (303,850 words). This is certainly not representative of the total 'production' of English in 19th-century Australia, but is dictated by a natural lack of sources. This register covers the text types Minutes (MI), Plays (PL) and Speeches (SP).
By far the smallest register is GE (200,201 words), which was used only by a very restricted number of people in clearly defined situations. It contains the text types Imperial Correspondence (IC), Legal English (LG) and Petitions & Proclamations (PP).
Figure 1. Word counts by register in COOEE.
In every period there are a like number of words in the different registers.
Sociolinguistic coverage
The following data about the authors (if known) were collected:
- name
- year of birth
- gender
- country/region of origin
- social status
- year of arrival in Australia
- gender, status and abode of the addressee (if the source was a letter)
Figure 2. Proportions of text by male and female writers in COOEE.
COOEE includes 164,000 words by Irish immigrants
- 8.2 % of the total 2,000,000 words.
The following textual properties were ascertained (if possible), coded and investigated:
- year of writing (or of publication)
- place of writing
- register of the text
- text type
- the number of words
- the name of the source
- the pages in the original text (if applicable).
|