Balanced corpora and quotation databases: Taking shortcuts or expanding methodological scope?

Laurel J. Brinton, Stefan Dollinger, and Margery Fee
University of British Columbia at Vancouver

Abstract

This paper focuses on the utility of a dictionary-based database for the purposes of linguistic research. The database is the product of digitizing a historical dictionary, A Dictionary of Canadianisms on Historical Principles (Avis et al. 1967), as part of an updating process. The citations of headwords in context (some 30,000) from the first edition of the dictionary have been combined with newly collected historical and present-day quotations (some 36,000) to form the Bank of Canadian English (BCE). Despite its medium size of 2.3 million words, it is the largest structured historical database of Canadian English.

After describing aspects of the process of collecting data for the BCE that distinguish it from other quotation databases, we present two test cases. First, in respect to deontic modality, we see a rise in have to and decline in, although not obsolescence of, must in real-time in the BCE that is consistent with apparent-time studies of contemporary Canadian English. Second, we examine the expansion of the progressive passive in Canadian English. The BCE provides quite early examples of the progressive passive and produces patterns of frequency quite similar to that shown by COHA for US English. The early appearance of the progressive passive in Canadian English casts doubt on one influential view concerning its origin. Based on these two case studies we argue that, beyond mere comparability, (semi‑)structured databases such as the BCE perform as well as historical balanced corpora in the area of historical regional dialectology.

1. Introduction

Balanced corpora, such as the Helsinki Corpus, the Brown and Lund-Oslo-Bergen family of corpora, the ICE (International Corpus of English)-Corpora, or ARCHER, are known to offer fairly reliable windows into linguistic practices. Because of the limitations on the size of such “small-and-tidy” corpora, however, Christian Mair (2006a) suggested that researchers might also need to consider “big-and-messy” corpora such as the web, the world’s biggest and messiest corpus. While today one can find corpus linguists, even historical corpus linguists, who deal exclusively with the web as their database, we may have other – less messy – resources at hand. First, scanning techniques, as used in the production of the Corpus of Historical American English (COHA), have produced a national diachronic corpus of more than 400 million words, a size hitherto confined to large-scale synchronic projects. Second, as has already been shown (Jucker 1994; Hoffmann 2004), the structured collection of quotations of historical dictionaries, such as the Oxford English Dictionary (OED), can reliably be used for historical corpus studies. In fact, in the same paper, Mair calls the OED dictionary-based approach “promising” (2006a: 366).

This paper focuses on the utility of a dictionary-based database for the purposes of linguistic research into the history of a national variety of English, Canadian English. The database is the product of digitizing a historical dictionary, A Dictionary of Canadianisms on Historical Principles, and combining it with newly collected historical and present-day quotations of headwords in context. The resulting database is the Bank of Canadian English (BCE); despite its medium size of 2.3 million words, it is the largest structured historical database of Canadian English. [1] Admittedly, quotation databases are not structured in the same way as corpora; the structure of the BCE is a result of its collection plan which will be discussed below.

After reviewing the genesis and production of this quotation database, we will present two test cases—changes in deontic modality and the progressive passive—to investigate whether, beyond mere comparability, structured databases such as the BCE may indeed outperform historical balanced corpora for the diachronic study of national varieties.

2. A Dictionary of Canadianisms on Historical Principles

2.1 DCHP-1 Print edition (1967)

The BCE is a by-product of our plan to revise the Dictionary of Canadianisms on Historical Principles (Avis et al. 1967) (DCHP-1) (Dollinger 2006 is an early work plan). Charles Lovell began work on the dictionary in 1954; it was seen to completion under Walter Avis in 1967 (Canada’s centennial year). After publication, the collection of words was continued by Avis and later by Matthew Scargill at the Lexicographical Centre of the University of Victoria in British Columbia until about 1989.

The dictionary was a major national success, which even the competing dictionary publishers appreciated. Then editor-in-chief of Funk & Wagnall’s dictionaries, Sidney Landau, wrote, “Now that we have it [the DCHP] I cannot imagine how we managed to get along so long without it” (personal letter to Avis dated 29 Nov. 1967).

In 2006, Dollinger picked up the baton of editor-in-chief and since the summer of that year the project has been located at the University of British Columbia’s Department of English. It has had two main goals:

    (1) digitizing the first edition to be put on line; and

    (2) producing a revision, A Dictionary of Canadianisms on Historical Principles, 2nd ed. (DCHP-2), primarily focused on post-1967 terms.

2.2 DCHP-1 Online (2011)

The print edition was scanned (2007) and then proofread and corrected between March 2010 and July 2011 by a dedicated team of students. July 2011 marks the milestone of the completion of DCHP-1 Online (Dollinger 2011). Figure 1 shows a typical online entry, with one of the original illustrations from the print edition.

Figure 1

Figure 1. DCHP-1 online: sample entry

2.3 DCHP-2

Data collection for the second edition of the dictionary was undertaken between 2006 and 2010, with new words and supporting quotations entered into the Bank of Canadian English. In identifying new Canadianisms we began by comparing synchronic dictionaries of Canadian English (such as the Canadian Oxford Dictionary [Barber 2004]). We supplemented this list with our own research and added new terms (e.g. grow-rip ‘theft of marijuana in a grow operation’) as they arose. A number of older terms missed by DCHP-1 have been included as well. Update citations were collected, primarily using digital databases (Canadian Newsstand, The Globe and Mail, Early Canadiana Online, etc.). While c.7000 new terms were researched, only about 700 will be accepted as new entries in DCHP-2. Figure 2 shows one citation in the BCE for gem jar, a kind of home canning jar. This is an example of an older Canadianism not recorded in DCHP-1.

Figure 2

Figure 2. Bank of Canadian English: citation for gem jar

3. Bank of Canadian English

The BCE includes the citations from DCHP-1 (some 30,000 citations) and the update citations for new Canadianisms suitable for the new edition (some 36,000 citations). Only a selection of these 36,000 update citations will be used in the DCHP-2. (Further on the BCE, see here.) So what is the function of all of the remaining quotations?

In contrast to the OED quotations database, the citations database [2] for DCHP-2 was from the beginning intended to serve as a linguistic database of historical Canadian English. These were the aims as Dollinger saw them:

The DCHP-2 database, therefore, is conceived as a multi-purpose database and research tool for use in many areas of inquiry, most notably, historical linguistics and dialectology, but also historical sociolinguistics, spanning the Late Modern English (LModE) period and the twentieth century in toto from a Canadian perspective (Dollinger 2006: section 3.2, under the heading “Bank of Canadian English on Historical Principles”).

While we did not have the resources to implement the tagging required for sociolinguistic studies, we have been able to meet the other aims. With our linguistic applications in mind, we have collected more than we would need for a dictionary update. Let us focus on three aspects of our collecting process: the selection of longer citations than is normal for dictionary citations, wide regional and temporal coverage, and the inclusion of a large amount of regional data.

3.1 Length of citations

First, our plan to create a general database has led us to include almost three times as much context as is found in the OED, as Figure 3 shows, though in recent years, the OED has also expanded its context. In the BCE, the average citation length is 33.9 words or 197 characters. With our short-text clipping method we have reached a size similar to that of the Helsinki Corpus; the current size of the BCE is 2.26 million words. [3]

Figure 3

Figure 3. Average citation lengths (OED figures based on Sheidlower 2011)

3.2 Regional and diachronic scope

Second, for the collection of data, we applied a “data extraction scheme” focusing on the Canadian provinces and territories on the one hand and on diachronic coverage on the other. If we have a headword whose first citation is later than World War II, we try to find supporting quotations in every decade from as many of the ten provinces and three territories as possible. For older words, we use 25-year intervals. A typical entry, such as reading week—the earliest citation located is 1969—yields examples from Ontario, Manitoba, and Alberta going back farthest in time, and citations from the other regions following 1990, with some gaps in the Maritimes (Nova Scotia) and territories (Northwest Territory and Nunavut). As far as our sources allowed, we have therefore gathered a structured data set, but the end result is a hybrid structure between structured corpus and unstructured quotations database, such as the OED. The BCE’s strength is that it probably represents the variety better than other corpora because of the large number of rather short samples from different texts and by different authors (without the distortion that large text samples might produce). These text clippings produce a large sample of different authors that even much bigger corpora, e.g. the 50-million word Strathy Corpus of Canadian English (Strathy) do not match.

3.3 Regional data

Third, we have been able to include regional data even if these may not be included in the revised edition. The DCHP-2, as a national dictionary whose coverage should be balanced throughout the country, can add only a limited array of regionalisms without starting to skew its coverage. In contrast, the BCE is not bound by such questions of representativeness and can be used to supplement regional dictionaries, such as The Dictionary of Newfoundland English online (Story et al. 1999) or the Dictionary of Prince Edward Island English (Pratt 1988) by storing regionalisms that may never make it into a regional or a national dictionary. The citations illustrating these regionalisms, although designed to support word definitions, may also prove useful for linguistic studies of regional variation. We illustrate with the example of Quebec English.

Quebec English has been influenced by contact with Quebec French, particularly in the area of lexical borrowing (see Fee 2008; Grant 2010). Researchers have typically used examples from the Montreal Gazette (the province’s only English-language daily newspaper) to substantiate this claim. Comparing searches from two regional newspapers, the Toronto Star and the Montreal Gazette, can show us that some words (depanneur) are at least discussed outside of Quebec English, with some representation outside Quebec (and thus candidates for inclusion in DCHP-2), while others, such as guichet and ketaine, might be moving in, but have not yet become Canadian English words for inclusion in DCHP-2 (see Table 1).

French loanwords meaning Montreal Gazette Toronto Star
ketaine/quetaine ‘kitschy’ 134 3
depanneur ‘corner store’ 3231 40
guichet ‘teller’s window’ 20 1

Table 1. Comparison of Quebec English words in the Montreal Gazette and Toronto Star (number of hits over 15 years: 1985–2010)

Sometimes a borrowing is not of a new word, but of a new sense; i.e., a cognate word borrowed from French long ago and already well-established in English is used with a new sense from contemporary French. If we consider animator, dossier, and primordial, we recognize that all of these words were borrowed into English from French and have some senses that are completely integrated, as in cartoon animator, criminal dossier, and primordial ooze. However, the senses illustrated below, all derived from high-frequency French senses, are rarely if ever found in Canadian English, and are often overlooked by lexicographers and sociolinguistic researchers.

  • animator: ‘someone who leads a workshop or event’
  • “Bread Basket Lac St. Louis needs a volunteer kitchen animator for a community kitchen once a month in Ste. Anne de Bellevue”. (Montreal Gazette 7 Oct. 2010 D19)

  • dossier: ‘issue’
  • “There is not a dossier we haven’t raised, even if we were not the official opposition”. (Montreal Gazette, 4 Jan. 2006 A1)

  • primordial: ‘vital’
  • “The quality and freshness of the fish are primordial”. (Montreal Gazette 26 Dec. 1990 C1)

The citations illustrating these regionalisms, although designed to support word definitions, may prove useful for linguistic studies that are interested in regional variation.

3.4 BCE – conclusion

Thus, what we now have in the BCE is a historical database—the best data for historical Canadian English to date. The BCE contains 66,658 citations (quotations) for 17,054 headwords (lemmas), for a total of 2.26 million running words. The database has a temporal range of 1505 to 2011. Figure 4 shows the distribution of citations per period. Note that we used 100-year intervals up to1800 and then 20-year intervals so that we could put the data in one graph. The relative dearth of examples prior to 1800 can be explained in part by the fact that there was “little English writing” in Ontario before 1812 (Talman and Talman 1977: 97), the historically most important province for the spread of Canadian English across the northern half of the continent. [4]

Figure 4

Figure 4. Numbers of citations per 20 year period in the Bank of Canadian English (100 year periods, 1500–1800)

On average, there are 3.9 citations per word, but there is a great diversity, from suspected new Canadianisms (often with only a few citations) to parkade ‘parking garage’ (with 21 citations from 1957 to 2008) to Canuck ‘Canadian’ (with 173 citations from 1849 to 2011).

4. Test cases


In the remainder of the paper, we will illustrate how the BCE can be used as a real-time resource for the study of Canadian English, using two examples.

4.1 Deontic modality

One example for such long-range study is shown in Figure 5. The example offers the long-term perspective that we have become used to with the Helsinki Corpus, but that is still a rarity in many varieties, certainly so in Canadian English. For colonial varieties, a time span of almost four centuries is uncommon. Here, four classes of deontic modal markers (be to, must, have to, [’ve] got to) are compared to one another from the 1620s to the present day. The percentage share of each class of forms is represented on the y-axis. The absolute token frequencies for be to are given, allowing one to gauge the overall tokens, which range from 700 in the period 1950–2009 to a mere nine prior to 1776. Beginning in about 1800, we have a solid base for historical data. This coincides with the first massive immigration of English speakers into Canada and is a reflection of the historical texts available. This study thus provides, for the first time, a complete lineage from “start to finish”; it complements other diachronic work (Dollinger 2008) and the increasing number of synchronic studies in the area (e.g. Tagliamonte and D’Arcy 2007), which have likewise shown a rise in have to and decline in, although not obsolescence of, must. This study on modal auxiliaries in the BCE is promising in allowing a long-range study of Canadian English in real-time.

Figure 5

Figure 5. Strong deontic modal auxiliary verbs in Canadian English 1664–2009 (in percent)

4.2 The progressive passive

The rise of the progressive passive is “the single most striking [syntactic] change of the last three centuries” (Pratt and Denison 2000: 411). There is agreement in handbooks and grammars that the construction arose in the late eighteenth century in private correspondence, as shown in (1) and (2):

      1) …that of the House of Commons was being debated when the post went out (1772 J. Harris Let. 8 Dec. in Earl of Malmesbury Series Lett. First Earl of Malmesbury (1870) I.264; OED)
      2) A fellow … whose grinder is being torn out by the roots (1795 Southey in C. Southey Life I.249; OED)

The modal and perfect forms (e.g. the issue may be being debated, the issue has been being debated) were fully integrated only in the twentieth century, and the constructions are still very rare in Present-day English (Mair 2006b: 90; Leech et al. 2009: 137). [5] For example, the 425-million word Corpus of Contemporary American English (COCA) yields only 5 examples of been being + past participle and 28 examples of be being + past participle.

Upon its first appearance, the progressive passive met with virulent opposition by prescriptivists (see Bailey 1996: 222; Visser 1973: 2427–2428): e.g., it is seen as “an outrage upon English idiom”, “ a monstrosity”, “a fatal absurdity”. [6] But its appearance can be understood as a system-internal change. The rise coincided with a general increase in the use of the progressive (Arnaud 1998), especially with non-agentive, non-human subjects (Hundt 2004a), and the (virtual) loss of the passival (e.g. the issues are debating). [7] There are a number of precursors to the progressive passive (Denison 1993: 431–433, 1998: 155–157, 2000: 131–132; Jespersen 1949: 210; see summary in Hundt 2004b: 93ff.); these undergo grammaticalization. [8] The development of the progressive passive has been described as the result of “systemic pressure” (Kranich 2010: 118–119, 242) leading to a “much more symmetrical” auxiliary system (Denison 1998: 151; cf. Visser 1973: 2426). Ultimately, the progressive passive was accepted in the later part of the nineteenth century; the turning point was the period 1850–70 when the progressive passive surpassed the passival in frequency (see Smitterberg 2005: 129).

Scholars have found that the progressive passive, though arising in informal texts, quite quickly spread to more formal texts. For example, Visser (1973: 2426–2427) records the construction in newspapers and journals shortly after its first appearance. In ARCHER, it is most common in journals and religious texts and later in newspapers (Hundt 2004b: 109); in the nineteenth-century CONCE, it is most common in science, debates, and history (Smitterberg 2005: 131). And Leech et al. (2009: 137, 142) find it to be most common in factually based, semi-formal genres such as newspapers in Present-day English.

4.2.1 Antedating the progressive passive

The same examples, e.g. (1) and (2), are cited in all of the literature on the progressive passive. However, it is now very easy, given electronic corpora, to antedate all the forms of this construction. Table 2 presents a number of antedatings. [9] In the case of the past plural were being, it is possible to antedate the OED example by over 100 years.

form earliest date in traditional sources antedating
am being 1825 OED And beg you to hasten a favorable reply since I am being put to expense here (1820 T. Jefferson, Letters to and from Jefferson; U of V)
is being 1795 OED [10] But as near it as a Man is being hang’d when the Sheriff cries (1717 Bullock, Woman is a riddle; ED)
are being 1807 Visser (1973: 2429) the slothful manner in which the Boston fortifications are being built (1776 Washington, The writings of George Washington form the original manuscript sources, Vol. 26; U of V)
was being 1772 OED the top where the Mark was being broke off (1719 OBC)
were being 1828 OED the Dressing Box wherein they were being taken away (1714 OBC)
be being 1915 Visser (1973: 2447) The rock-strata, miles thick, may be being flexed now under our feet (1909 COHA:MAG)
been being 1841 OED to America, where some new and curious experiments have been trying (or been being tried, as our author would phrase it,) (1837 COHA:MAG)

Table 2. Antedatings of the progressive passive

The last example, in which the author expresses a preference for the passival been trying over the perfect progressive passive been being tried (which seems to be being attributed to American usage), suggests that the complex form was still not fully accepted in the mid-nineteenth century.

4.2.2 Progressive passive in the BCE

Turning to the citation database of Canadian English, the Bank of Canadian English, we find quite old examples, given the rarity of this syntactic structure and the relatively small size of the corpus, though no direct antedatings (see Table 3). There are no examples of the complex forms be being and been being. There is also no pre-twentieth- century example of the first-person present am being; this will be discussed below.

form earliest example earliest example in the BCE
am being 1820 Is there a way of my making certain I am being given credit (1990 Montreal Gazette)
is being 1715 so that I might be able to keep track of the progress that is being made by a portion of mankind (1852 Voice of the Fugitive I; DCHP-1)
are being 1776 The concession lines and side-roads are being cut out at the expense of the Home Government (1832 Canadian Freemen; DCHP-1)
was being 1772 Her powder was then being towed in a bateau while her magazine was being built (1814? Wood, Select British Documents of the Canadian War of 1812, Vol. 1)
were being 1714 At the meeting, inflammatory speeches were being made. (1849 Parliament of Great Britain, Canada Papers Relative to the Affairs of Canada)

Table 3. Earliest examples of the progressive passive in the BCE

Figure 6, representing the raw frequency, shows a significant increase in frequency of the form in the twentieth century, rising to nearly 100 examples in the last two decades of the twentieth century and first decade of the twenty-first century.

Figure 6

Figure 6. Raw frequency of progressive passives per 20 year period in the BCE

However, comparing Figure 6 with Figure 4, we might be inclined to attribute the rise solely to the increasing numbers of citations in the BCE over this same time period. But is this true? Normalizing the frequency of the progressive passive in words per million, as displayed in Figure 7, shows us that the rise in frequency is real.

Figure 7

Figure 7. Normalized frequency of progressive passives (per million words) in the BCE

While the number of citations in the BCE increases dramatically in the 1940–1959 period (see Figure 4), Figure 7 shows a sizeable increase in the progressive passive in the mid-nineteenth century (1860–1879). This is followed by a slight decline (1880–1919), with a second rise in the 1920–1939 period, remaining relatively stable at roughly 160–180 progressive passives per million words. The dip in the middle of the twentieth century is not entirely explicable, but may be associated with proscriptions against the passive (see below).

If we compare the BCE with a balanced (but unproofread) database, the Corpus of Historical American English (COHA) (see Figure 8), we find the same general pattern until the 1940s. However, US English shows a decline in usage in this construction from the 1950s onward, with a steep fall beginning in the 1990s (to 84 per million words in the most recent period). Hundt (2004b: 110) and Leech et al. (2009: 124, 136–137) have found a higher rate of usage of the progressive passive in British English than in US English. Leech et al. (2009: 136) suggest that prescriptions against the passive generally in American English may be responsible for this difference.

The figures in Canadian English for the progressive passive are surprisingly high, with 183 constructions per million words in the period 1980-1999 and 134 per million in the most recent period. [11] They are on par with the figure of 175 per million words in Leech et al.’s data for the present progressive passive in the British English data in FLOB (2009: 138, Figure 6.6). Thus, it could be speculated that Canadian English, perhaps because of differences in the educational system, was not affected by US proscriptions.

Figure 8

Figure 8. Normalized frequency of progressive passives (per million words) in 400+ million word Corpus of Historical American English. Search of am/is/are/was/were/be/been + past participle. [accessed 18 March 2012]

Finally, as shown in Figure 9, the numbers of progressive passive forms by person/tense (shown as percentage of the total forms) in our historical database, the BCE, compared with two synchronic corpora of Canadian English, ICE-Canada and the Strathy corpus, show a relatively constant pattern over time. As expected for a more colloquial form, the progressive passive is most common in the present tense (is being, are being). [12] The perfect and perfect forms are consistently very low frequency:

      3) minerals, that have been being created and changed for at least 3000 million years (Strathy; ACA_HUMAN.txt)
      4) I see this morning some specialized police services may be being called on (ICE-Canada; S1B-048.TXT)

Notably, the progressive passive in the first-person present tense is extremely rare, both diachronically and synchronically, and in our current stage of research we are not able to explain this frequency rate. It is so low in frequency that it does not show up in the ICE-Canada corpus:

      5) Is there a way of my making certain I am being given credit for contributions made to the two plans? (BCE; 1990 Montreal Gazette 6 Oct., F11)

The far right columns in Figure 9 show the percentage of progressive be + Adj/N forms (e.g. he is being a fool, he is being insincere). While these constructions are sometimes seen as a source of the progressive passive, the rarity of these forms overall provides further support for the view that this construction could not have had an influence on the rise of the progressive passive; furthermore, it has also been shown that they arise later than the progressive passive (Denison 1998: 155).

Figure 9

Figure 9. Progressive passives by person/tense in Canadian English (shown as a percentage of the total forms of progressive passives).
Strathy (ac/fi) = Figures for the academic and fiction subsections of the Strathy Corpus.

4.2.3 Summary

The BCE, despite a relatively low number of citations from the earlier periods, provides quite early examples of the progressive passive (often within 50 years of the oldest known examples). And despite the fact that citations are bound by the requirements of lexicographical work, the collection scheme of the BCE has produced a structured database that is compact in size yet expansive in diachronic coverage and yields patterns of frequency quite similar to that shown by COHA for US English. Comparison of the historical data of BCE with contemporary data of ICE-Canada and the Strathy corpus reveals a distribution by tense and person of the subject in Canadian English that has remained relatively constant over time. Throughout history, the form has been more common in the present tense and very rare in the first person singular, a phenomenon that has not before been noted.

Pratt and Denison (2000; also Denison 1998: 153–155) argue that the progressive passive was spread by the “Lake Poets” literary group (c.1795–1830) – consisting, among others, of Robert Southey, Samuel Taylor Coleridge, Mary Shelley, Charles Lamb, and William Wordsworth – who knew each other, sometimes lived in close proximity, and corresponded copiously. This literary circle may have constituted a social network in Milroy’s sense (Pratt and Denison 2000: 416) and Pratt and Denison argue that the progressive passive, a generally known, if “unrespectable” form (2000: 416), was consciously used by this group of young iconoclasts and hence spread (cf. also Smitterberg 2005: 130, who finds some support for this theory).

Evidence from the BCE and other electronic corpora casts doubt on this theory and points to its being an artifact of the data sampling used. The BCE shows that the progressive passive was used (although sparingly) in printed texts in Canada from the 1830s onward. In the University of Virginia Modern English Collection, we find use of the progressive passive quite early (1780s) in American English in official correspondence by George Washington and Thomas Jefferson. In COHA, the form occurs 15 times in the 1830s in printed texts (6 in fiction, 7 in non-fiction, 2 in magazines). The occurrence of this form in post-colonial Englishes in such a short period of time, and its use in a variety of written genres, renders it unlikely that it could have been spread across the Atlantic by a small literary group in Britain.

5. Conclusion: Assessment of BCE and structured databases

To return to the question we posed in our introduction – For the study of the history of national varieties, can structured databases “outperform” conventional historical corpora? – our conclusion is that databases such as the BCE do certainly perform as well as balanced historical corpora in this area. Furthermore, we can point to several distinctive features of the BCE. Because of the concise context of text-clippings, the diversity of authors represented is better than in any fully balanced corpus. We can also provide proofread, clean citations in a database that is available online. Already, the BCE has been used, apart from its obvious lexicographical purpose, in dialectology (Dollinger and Schneidemesser 2011), in lexicology (Dollinger and Brinton 2008) and for the diachronic study of grammatical variables (Yerastov 2009). Looking to the future, we see that the digitization of historical reference sources other than the OED and the BCE offers further opportunities for historical corpus study (e.g. Markus, Upton, and Heuberger 2010 research using a digitized version of Wright’s English Dialect Dictionary). Such studies point to a possible new direction in historical computational linguistics.

Notes

[1] The BCE will also grow in size as 1.2 million words on paper slips are digitized.

[2] In DCHP parlance (e.g. Dollinger 2006, 2010), “citation” is usually the preferred term over “quotation”. The two terms are interchangeable.

[3] Statistics on the BCE are as of September 2011.

[4] Post-1967 data are almost entirely print – our major source was newspapers, with books next. This distribution is a result of the database searches that were the core of the expansion program. In the future, the option to include more spoken-language citations could be explored.

[5] In the early part of the twentieth century, Curme (1931: 445) still finds these combinations “intolerable”; writing some fifty years later, Visser (1973: 2446) feels that the complex forms are so rare that they have “not yet reached the status of a generally recognized idiom”. Denison (1993: 429–430, 1998: 157) suggests that some of these forms appearing in early grammars are “artificial” since they are supplied simply to fill out the paradigm.

[6] Earlier the passival (see below) had met with similar resistance from scholars (e.g. Johnson) (see Bailey 1996: 223; Jespersen 1949: 210).

[7] Visser says that the “passival” (his term; see 1973: 2005) rises in the sixteenth century, gradually increases in the seventeenth century, has high frequency in the eighteenth and nineteenth centuries, and perceptibly declines in the twentieth century (1973: 2007–2009). However, other scholars (e.g. Smitterberg 2005: 130; Beal 2004: 81) date its decline earlier, to the mid-nineteenth century. Hundt (2004b) sees the progressive passive and passival, though slightly different, as functionally equivalent (85–92). She argues that the passival still exists (see also Curme 1931: 444; Kranich 2010: 116), but is marked and specialized (105). Leech et al. (2009: 138) find no examples in their corpora of Present-day English.

[8] According to Denison (see, especially, 2000: 131–132) precursors of the progressive passive include main verb be + being built-type phrase as participial or gerundial appositive absolute, with resultative, not durative meaning, as in There is a good opera … now being acted; money is very near being all exhausted. be is reanalyzed as an auxiliary as the construction shifts from the passive of be to the progressive of the verb of the past participle.

[9] One must be cautious with one’s analysis, as Denison (2000: 134) warns: “of all the auxiliaries, progressive BE is the one where the semantic difference between a full-verb use and auxiliary use is least perceptible, giving us wide latitude in dating a reanalysis”.

[10] A 1756 example of is being hung in the OED (s.v. scrag) is rejected by Denison (1993: 439). A 1669 example of is (being denied) in the OED (s.v cock-and-bull) is likewise not included in traditional accounts.

[11] The number of progressive passives in ICE-Canada (117/million words) is not as high as that found in the BNC, likely because of the inclusion of a high proportion of spoken data in the ICE corpus (60%; see http://ice-corpora.net/ice/design.htm). The progressive passive would appear to be less common in spoken than in written genres (see above). Nonetheless, the number of progressive passives here is still higher than that found in the US English of COCA.

[12] Cf. also Leech et al (2009: 124), who found a rise in the use of the progressive generally only in the present tense in both British and US English in the twentieth century.

Sources

BCE: Bank of Canadian English. 2006–. Stefan Dollinger, Laurel J. Brinton, and Margery Fee (eds.). http://dchp.ca/Bank-2/login.php. For further information, see http://faculty.arts.ubc.ca/sdollinger/bce.htm.

Canada’s Heritage from 1844 – The Globe and Mail. 2011. Proquest LLC http://www.proquest.com/en-US/catalogs/databases/detail/canada_heritage.shtml

Canadian Newsstand. 1977–present. http://www.proquest.com/en-US/catalogs/databases/detail/canadian_newsstand.shtml

COCA: The Corpus of Contemporary American English (COCA): 425+ million words, 1990–2011. 2008–. Mark Davies (compiler). Available online at http://corpus.byu.edu/coca/

COHA: The Corpus of Historical American English (COHA): 400+ million words, 1810–2009. 2010–. Mark Davies (compiler). Available online at http://corpus.byu.edu/coha

Dictionary of Newfoundland English Online. 1999. George Morley Story, William J. Kirwin, and John David Allison Widdowson (eds.). 2nd edn. with supplement. Toronto: University of Toronto Press. Available online at http://www.heritage.nf.ca/dictionary/

DCHP-1 Online: A Dictionary of Canadianisms on Historical Principles Online. 2011. Stefan Dollinger (editor-in-chief). Based on Avis et al. (1967). With the assistance of Laurel J. Brinton and Margery Fee. http://dchp.ca/DCHP-1/ (if not found in open access, contact the editor for research password access)

DCHP-2: A Dictionary of Canadianisms on Historical Principles, 2nd ed. online. In progress. Stefan Dollinger, Laurel J. Brinton, and Margery Fee (eds.). http://dchp.ca/DCHP-2/. See http://faculty.arts.ubc.ca/sdollinger/dchp2.htm#wha.

Early Canadiana Online. 1998–2010. canadiana.org. Avaialable online at http://www.canadiana.ca/en/eco

ICE-Canada: International Corpus of English – Canada. 2010. John Newman (compiler). http://ice-corpora.net/ice/icecan.htm

OBC: The Proceedings of the Old Bailey Corpus – London’s Central Criminal Court, 1674–1913. Tim Hitchcock and Robert Shoemaker (co-directors). Available online at http://www.oldbaileyonline.org/

OED: Oxford English Dictionary. 2000–. 3rd ed. online (in progress). John Simpson (general editor). Oxford: Oxford University Press. http://www.oed.com

Strathy: The Strathy Corpus of Canadian English. http://www.queensu.ca/strathy/projects.html

U of V: Modern English Collection. 2008. Electronic Text Center, University of Virginia. Available online at http://etext.lib.virginia.edu/modeng/modeng0.browse.html

Wright’s English Dialect Dictionary [beta version]. 2009. Manfred Markus and Reinhard Heuberger (eds.). http://www.uibk.ac.at/anglistik/projects/speed/

References

Arnaud, René. 1998. The development of the progressive in 19th century English: A quantitative survey. Language Variation and Change 10: 123–152.

Avis, Walter, Charles Crate, Patrick Drysdale, Douglas Leechman, Martin H. Scargill, & Charles J. Lovell. 1967. A Dictionary of Canadianisms on Historical Principles. Toronto: W. J. Gage. [DCHP-1]

Bailey, Richard W. 1996. Nineteenth-Century English. Ann Arbor: University of Michigan Press.

Barber, Katherine. 2004. Canadian Oxford Dictionary. 2nd edn. Toronto: Oxford University Press.

Beal, Joan C. 2004. English in Modern Times. London: Arnold.

Curme, George O. 1931. Syntax. Vol. III A Grammar of the English Language. Boston: D. C. Heath (Reprint, Verbatim, 1977.)

Denison, David. 1993. English Historical Syntax. London & New York: Longman.

Denison, David. 1998. “Syntax”. The Cambridge History of the English Language, ed. by Suzanne Romaine, vol. IV, 1776–1997, 92–329. Cambridge: Cambridge University Press.

Denison, David. 2000. “Combining English auxiliaries”. Pathways of Change: Grammaticalization in English, ed. by Olga Fischer, Anette Rosenbach, & Dieter Stein, 111–147. Amsterdam & Philadelphia: John Benjamins.

Dollinger, Stefan. 2006. “Towards a fully revised and extended edition of the Dictionary of Canadianisms on Historical Principles (DCHP-2): Background, challenges, prospects”. Historical Sociolinguistics/Sociohistorical Linguistics 6. 9 Aug 2011. http://www.let.leidenuniv.nl/hsl_shl/DCHP-2/DCHP-2/DCHP-2.htm.

Dollinger, Stefan. 2008. New-Dialect Formation in Canada: Evidence from the English Modal Auxiliaries. Amsterdam & Philadelphia: John Benjamins.

Dollinger, Stefan. 2010. “A new historical dictionary of Canadian English as a linguistic database tool. Or, making a virtue out of necessity”. Current Projects in Historical Lexicography, ed. by John Considine, 99–111. Newcastle upon Tyne: Cambridge Scholars Publishing.

Dollinger, Stefan & Laurel J. Brinton. 2008. “Canadian English lexis: Historical and variationist perspectives”. Anglistik: International Journal of English Studies (Special Issue “Focus on Canadian English”, ed. by Matthias Meyer) 19(2): 243–264.

Dollinger, Stefan & Luanne von Schneidemesser. 2011. “Canadianism, Americanism, North Americanism? A Comparison of DARE and DCHP”. American Speech 86(2): 115–151.

Fee, Margery. 2008. “French Borrowings in Quebec English”. Anglistik: International Journal of English Studies (Special Issue “Focus on Canadian English”, ed. by Matthias Meyer) 19(2): 173–188.

Grant, Pamela. 2010. “English usage in contemporary Quebec: Reflections of the local”. Canadian English: A Linguistic Reader, ed. by Elaine Gold & Janice McAlpine, 177–197. (Occ. paper 6.) Kingston, ON: Strathy Language Unit, Queen’s University.

Hoffmann, Sebastian. 2004. “Using the OED quotations database as a corpus – a linguistic appraisal”. ICAME Journal 28: 17–30. 24 Nov 2011. http://icame.uib.no/ij28/index.html

Hundt, Marianne. 2004a. “Animacy, agentivity, and the spread of the progressive in Modern English”. English Language and Linguistics 8(1): 47–69.

Hundt, Marianne. 2004b. “The passival and the progressive passive: A case study of layering in the English aspect and voice systems”. Corpus Approaches to Grammaticaticalization in English, ed. by Hans Lindquist & Christian Mair, 79–120. Amsterdam & Philadelphia: John Benjamins.

Jespersen, Otto. 1949. A Modern English Grammar on Historical Principles. Part IV Syntax, Third Volume Time and Tense. Copenhagen: Ejner Munksgaard/ London: George Allen & Unwin.

Jucker, Andreas H. 1994. “New dimensions in vocabulary studies: Review article of the Oxford English Dictionary (2nd edition) on CD-ROM”. Literary and Linguistic Computing 9(2):149–154.

Kranich, Svenja. 2010. The Progressive in Modern English: A Corpus-based Study of Grammaticalization and Related Changes. Amsterdam: Rodopi.

Leech, Geoffrey, Marianne Hundt, Christian Mair, & Nicholas Smith. 2009. Change in Contemporary English: A Grammatical Study. Cambridge: Cambridge University Press.

Mair, Christian. 2006a. “Tracking ongoing grammatical change and recent diversification in present-day standard English: The complementary role of small and large corpora”. The Changing Face of Corpus Linguistics, ed. by Antoinette Renouf & Andrew Kehoe, 355–376. Amsterdam: Rodopi.

Mair, Christian. 2006b. Twentieth-century English: History, Variation and Standardization. Cambridge: Cambridge University Press.

Markus, Manfred, Clive Upton, & Reinhard Heuberger, eds. 2010. Joseph Wright’s English Dialect Dictionary and Beyond: Studies in Late Modern English Dialectology. Berne: Lang.

Pratt, Lynda & David Denison. 2000. The language of the Southey–Coleridge Circle. Language Sciences 22: 401–422.

Pratt, Terry Kenneth. 1988. Dictionary of Prince Edward Island English. Toronto: University of Toronto Press.

Sheidlower, Jesse. 2011. “How quotation paragraphs in historical dictionaries work: The Oxford English Dictionary”. Contours of English and English Language Studies, ed. by Michael Adams & Anne Curzan, 191–212. Ann Arbor: University of Michigan Press.

Smitterberg, Erik. 2005. The Progressive in 19th-Century English: A Process of Integration. Amsterdam and New York: Rodopi.

Story, George Morley, William J. Kirwin, & John David Allison Widdowson, eds. 31999. 21990. 11982. Dictionary of Newfoundland English. Toronto: University of Toronto Press.

Tagliamonte, Sali A. & Alexandra D’Arcy. 2007. “The modals of obligation/necessity in Canadian perspective”. English World-Wide 28(1): 47–87.

Talman, J. J. & R. Talman. 1977. The Canadas 1763–1812. Literary History of Canada. Canadian Literature in English, ed. by Carl F. Klinck, vol. I, 97–105. Toronto: University of Toronto Press.

Visser, Frederic Theodor. 1973. An Historical Syntax of the English Language. Part III, Second Half. Syntactical Units with Two or with More Verbs. Leiden: E. J. Brill.

Yerastov, Yuri. 2009. “Transitive be perfect in the history of English in light of modern dialectal evidence”. 6th Studies in the History of English Linguistics Conference, Banff, AB, Canada. 1 May 2009.