Basic structure
There are 177 text files in the CED, yielding a total of 1,183,690 words. The CED contains texts representative of five text types (plus a mixed bag of dialogues labelled 'Miscellaneous'), which divide into two categories: these are 'authentic dialogue', which is written records of real speech events (Trial Proceedings and Witness Depositions), and 'constructed dialogue', in which the dialogue is constructed by an author (Drama Comedy, Didactic Works, and Prose Fiction). Furthermore, the text types may consist of dialogue in which the intervention of the narrator is minimal, limited to identifying the speaker, or marking scene changes and the like, as in Drama Comedy, Didactic Works, and Trial Proceedings, whereas other text types contain considerable intervention by the scribe or narrator, with dialogue embedded in a third person narrative, as in Witness Depositions and Prose Fiction.
The text types can be briefly described as follows. Trial Proceedings are reports of the proceedings in court, typically recorded by an official scribe (records written by those involved in the trial, such as the defendant, were excluded from the CED). The dialogue is recorded as direct speech, generally in the form of questions and answers. Witness Depositions are written records of the oral testimony of witnesses, usually given before the trial proceedings themselves, which are rendered by a scribe as a third person narrative, with legal formulae inserted. Occasionally, dialogue from earlier speech events cited by a witness is rendered as direct speech by the scribe. Drama Comedy contains dialogue in the form of direct speech, invented by an author. The text type Didactic Works also consists of texts constructed by an author with the dialogue presented as direct speech. These are handbooks and instructional treatises, typically containing dialogues between 'instructor' and 'instructee'. Language teaching handbooks are a small group of texts set apart from the other Didactic Works (which are hence labelled 'Other' i.e. 'other than language teaching'): the language of the dialogue can be contrived for didactic purposes, and is likely to be influenced by both the target language and the author's native language. Prose Fiction texts include fictional, constructed dialogue, but unlike Drama Comedy the dialogue can be presented in the form of direct or indirect speech, and is surrounded by narration by the 'storyteller'. 'Miscellaneous' is not a text type as such, but a collection of dialogues presented as direct speech which could not be classified as belonging to any of the above text types.
Table 1 illustrates the overall structure of the CED outlined above, and also gives the overall word counts for each text type and the Miscellaneous texts. (These counts were obtained using the Hcount computer program, excluding foreign language, editorial comments, and text added by the corpus compilers.)
Table 1. Overall structure of the CED and word counts for each text type (and Miscellaneous texts).
|
Authentic dialogue |
Word count |
Constructed dialogue |
Word count |
Minimum narratorial
intervention |
Trial Proceedings |
285,660 |
Drama Comedy |
238,590 |
Didactic Works
A. Other |
162,250 |
B. Language Teaching |
74,390 |
Miscellaneous |
25,970 |
Considerable narratorial
intervention |
Witness Depositions |
172,940 |
Prose Fiction |
223,890 |
Total word count |
|
458,600 |
|
725,090 |
Figure 1. Authentic and constructed dialogue in CED.
Figure 2. Authentic dialogue in CED.
Figure 3. Constructed dialogue in CED.
In the CED, the 200-year period 1560–1760 is divided into five 40-year periods, as shown in Table 2. This table also gives the word counts for each period for all text types (and Miscellaneous texts) taken together.
Table 2. The periodization of the CED and the period word counts (for all five text types plus Miscellaneous texts).
Period |
Period totals |
1. 1560-1599 |
200,150 |
2. 1600-1639 |
204,470 |
3. 1640-1679 |
259,240 |
4. 1680-1719 |
297,090 |
5. 1720-1760 |
222,740 |
Total |
1,183,690 |
Figure 4. The periodization of the CED and the period word counts (for all five text types plus Miscellaneous texts).
Where possible, we sampled one or more extracts amounting to around 10,000 words from each text. However, shorter texts that otherwise fulfilled our selection criteria were also used. The main criteria for selection were that the texts should:
- belong to one of the text types described above
- include speech presentation, preferably in the form of direct speech
- preferably include speakers of both sexes
- preferably include speakers representative of a range of social ranks
- represent the language of the period 1560-1760
- preferably be the earliest extant printed version.
Coding in the CED has been added with a view to offering computerized texts which remain as close as possible to the original versions, while facilitating computer searches. Font changes, foreign language, headings, editorial comments, and text added by the compilers (comments on source text peculiarities etc.) have been marked off from the running text by coding. As the CED is primarily designed for the study of dialogue, we have taken two additional measures: long narrative passages have been omitted and replaced by a short summary (coded as compilers' comments), and direct speech has been distinguished from the rest of the running text. The latter was done by marking off text other than direct speech, although it was not always unproblematic to mark e.g. where indirect speech ended and direct speech began. Based on our interpretation, in Table 3 we give the overall word counts for direct speech in the five 40-year periods.
Table 3. Period word counts for direct speech (for all five text types plus Miscellaneous texts).
Period |
Period totals |
1. 1560-1599 |
140410 |
2. 1600-1639 |
145880 |
3. 1640-1679 |
192150 |
4. 1680-1719 |
237030 |
5. 1720-1760 |
178630 |
Total |
894100 |
Figure 5. Period word counts for direct speech (for all five text types plus Miscellaneous texts).
Figure 6. Direct speech and 'other' text in CED. |