Variability in verb complementation: Determinants of grammatical variation in indigenized L2 varieties of English

Raquel P. Romasanta
University of Vigo


Verb complementation is one of the areas where variability and change in indigenized L2 varieties of English is frequently observed. As such, it has been addressed from a semantic and pragmatic point of view, and this study does likewise. Here the focus is the complementation profile of the verb regret, a verb which has historically shown non-categorical variation between declarative finite that/zero-complement clauses and gerunds (e.g. I regret that I said that / saying that). The database comprises four different varieties of English (American English, British English, Hong Kong English, and Nigerian English) as represented in the Corpus of Global Web-Based English (GloWbE).

An analysis of the distribution of the two available patterns in the English varieties and the different substrate languages (Cantonese in Hong Kong English, and Hausa, Igbo, Yoruba and French in Nigerian English) suggests that both cognitive effects derived from language contact situations and second language acquisition processes (mainly increased isomorphism and transparency) and influence of substrate languages serve as possible explanations for the higher proportions of declarative finite that/zero-complement clauses in the L2 varieties here. The binary logistic regression analysis of other intra- and extra-linguistic factors drawn from the literature shows that the choice of declarative finite that/zero-complement clauses is determined by factors such as inanimate and non-coreferential subjects, presence of negative markers, passive voice, action verbs, text type General, simultaneous temporal relation, and an increase in the number of words in the complement clause and in the intervening material between the two clauses.


The diachronic evolution of the English complementation system has received a great deal of attention (Warner 1982; Fischer 1988; Rohdenburg 1995, 2006; Fanego 1996a, 1996b, 2004a, 2007; Rudanko 1999, 2000; Vosberg 2003; De Smet 2013; Rickman & Rudanko 2018); some studies in the field of World Englishes (WEs), especially those concerned with ditransitive verbs and the competition between infinitives and gerunds, have also considered it an area of innovation and change  (Olavarria de Ersson & Shaw 2003; Mukherjee & Hoffmann 2006; Mukherjee & Schilk 2008; Mukherjee & Gries 2009; Schilk et al. 2012, 2013; Bernaisch 2013; Nam et al. 2013; Deshors 2015; Gries & Bernaisch 2016; Deshors & Gries 2016). For example, Schneider (2007: 86) states that, “a classic example [of innovation in varieties in stage 3, nativization] is the complementation patterns which verbs and also adjectives typically enter.” These studies focus mainly on the semantic and syntactic factors that influence the alternation between different complementation patterns, as I do in Section 3.2. However, they have not looked in detail at the potential effects of the processes derived from the language contact situation (such as the impact of the substrate languages in the speaker’s use of English, see Section 2.2 for a more detailed analysis of the complementation systems of the substrate languages) nor have they focused on the fact that these varieties of English emerged and developed as L2s and may as such be subject to Second Language Acquisition (SLA) effects as well. Some linguistic features which are attested in many geographically distant varieties of English cannot be explained in terms of language-contact and may therefore be the result of the L2 learning processes (Mesthrie & Bhatt 2008: 159). One of these processes is the preference or tendency towards simplicity in the sense of “a propensity for transparency” (also known as isomorphism or iconicity; Steger & Schneider 2012: 156), which is defined as the one-to-one mapping of form and meaning. Regarding the complementation system, it has been proved that indigenized L2 varieties of English show a preference for finite patterns over non-finite structures, that is, a preference for more explicit forms (Steger & Schneider 2012: 172; cf. also Thomason 2008; Schneider 2012, 2013). Finite complement clauses are said to be more transparent and isomorphic than the non-finite alternative since they are explicitly marked for “tense and agreement, modality, and a complementizer”, and thus are easier to process (Givón 1985: 200; Steger & Schneider 2012: 165). This tendency towards isomorphism and the use of more explicit forms, proper to SLA situations, is also captured by the Complexity Principle (Rohdenburg 1996, 2006), which is claimed to be universal to language and states that,

in the case of more or less explicit constructional options, the more explicit one(s) will tend to be preferred in cognitively more complex environments (Rohdenburg 1996: 151, 2006: 147)

Rohdenburg (1996, 2006) argues that these “cognitively more complex environments” include, e.g., the presence of negative markers, the use of the passive voice, and the presence of intervening material between the main clause and the complement clause (CC), among others.

The present study attempts to integrate cognitive factors pertaining to the language contact settings in which these languages develop (which involve the potential influence of substrate languages and SLA processes) in the description of the variation found in the complementation profile of the retrospective verb regret in present-day English. Together with cognitive factors, I will also conduct a regression analysis of a number of syntactic and semantic factors shown to impact the alternation between the available complementation patterns (e.g. complexity of the complement clause, intervening material, subject animacy of the complement clause, voice of the verb in the complement clause; cf. Cuyckens et al. 2014). The variability witnessed in the diachronic evolution of the complementation of regret as shown in at least two studies, those of Heyvaert and Cuyckens (2010) and Cuyckens et al. (2014), makes this verb a very good candidate for research in WEs, as explained further below.

The database for this analysis covers four varieties of English, two L1 or inner circle varieties, American English (AmE) and British English (BrE), chosen as reference varieties, and two L2 or outer circle varieties, Hong Kong English (HKE), and Nigerian English (NigE). These two outer circle varieties were singled out because (i) they are in stage 3 within Schneider’s Dynamic Model (Schneider 2007), the stage in which most grammatical innovations are expected to occur (Schneider 2007: 86), and (ii) they are two historically and geographically unrelated varieties with different substrate languages. The language studied here is that of the internet (Corpus of Global Web-Based English (GloWbE); Davies 2013).

regret exhibits a categorical alternation between gerundial (-ing) and to-infinitival sentential complements: the gerund has an anterior or retrospective meaning, that is, it “refers to a preceding event or occasion coming to mind at the time indicated by the main verb” (Quirk et al. 1985: 1193), while the infinitive has a prospective meaning, in that it “indicates that the action or event takes place after (and as a result of) the mental process indicated by the verb has begun” (Quirk et al. 1985: 1193; cf. examples (1) and (2) respectively). [1] In terms of the expression of prospective meaning (see example (2) below), no variation is detected in my corpus sample: the prospective meaning is always expressed by to-infinitive complement clauses. However, there is alternation between declarative finite that/zero-complement clauses and gerunds with a retrospective meaning, as might be expected, since it is shown to be “less categorical” in Cuyckens et al. (2014: 182). In other words, both constructions seem to be freely interchangeable, as in the two examples in (1).

(1) I regret telling you that John stole it. [“I regret that I told you that John stole it” or “…that I am now telling you…”] (Quirk et al. 1985: 1193)
(2) I regret to tell you that John stole it. [“I regret that I am about to tell you that John stole it”] (Quirk et al. 1985: 1193)

From a diachronic perspective, Heyvaert and Cuyckens (2010) examine the verb regret together with resent, be sorry, admit and agree, from Early Modern British English to present-day British English, and find an increase in the use of gerunds since the Late Modern British English period to the 1990s (gerunds range from 23.2% in the period 1710–1920 to 57.6% in the 1990s). Cuyckens et al. (2014), on the other hand, consider regret together with remember and deny in Late Modern British English and, by contrast, find a slight decrease in the use of gerunds during this period (gerunds range from 33.3% in 1710–1780 to 28.1% in 1781–1920). The present-day English data presented here will reveal whether the current trend is towards an increasing or decreasing use of gerunds.

From a synchronic perspective, on the other hand, only one study is available (cf. Romasanta 2017). In this article, the author studies the present-day complementation profile of regret in five varieties of English (AmE, BrE, JamE (Jamaican English), HKE, and NigE) with a special focus on the distribution of finite and non-finite patterns and the possible influence of substrate languages and cognitive processes derived from the SLA and language contact situation in which the non-native varieties of English emerged. In general terms, the main conclusions in this study are that (i) there is a higher proportion of non-finite patterns than finite patterns in all the varieties studied, but specially so in the native varieties (the proportion of non-finite patterns in each variety is as follows: 68.69% in AmE/BrE, 65.4% in JamE, 61.1% in HKE, and 53.4% in NigE), and (ii) both substrate influence, and hyperclarity and isomorphism, seem to be important factors determining the relatively lower proportions of non-finite patterns in the non-native varieties. In view of these findings, the present article will, as explained above, dig deeper into these differences between inner and outer circle varieties by including new semantic and syntactic factors that may determine the choice between finite and non-finite forms.

The outline of this paper is as follows. Section 2 describes the methodology and the varieties chosen for the study, including their substrate languages. Section 3 deals with the data analysis and is divided into three subsections: subsection 3.1 offers a brief overview of the data analyzed and a comparison with previous studies; subsection 3.2 discusses the statistical analysis and results of the intra- and extra-linguistic predictors influencing the speaker’s choice between the available patterns; and subsection 3.3 analyzes the possible impact of the cognitive effects previously mentioned (essentially isomorphism and transparency) as well as that of the substrate languages. Finally, Section 4 offers a summary of the findings.


The data for this study, as noted in the introductory section, is taken from the GloWbE corpus, which comprises some 1.9 million words from 20 different varieties of English. [2] As already mentioned, the database for the present study covers four varieties of English (AmE, BrE, HKE and NigE). I ran a general search for “regret” as a verb lemma (regret*_v*) using the online interface. For the outer circle varieties, I retrieved all the examples available in the corpus (a total of around 1,700). For the inner circle varieties, because the number of examples available in the corpus was very high, I took a random sample of 2,000 examples for each variety.

After manual pruning of false positives and invalid examples and a preliminary analysis of the complementation patterns, the final number of valid examples that exhibit the alternation between that/zero-complement clauses and gerunds with retrospective and simultaneous meanings is 1,579 for all four varieties. All these tokens were then coded for a number of semantic and syntactic predictors. [3]


The statistical analysis takes into consideration the following factors:

(i) complementation_type

This is the dependent variable, and two values are distinguished: that/zero-complement clauses or gerund (as in examples (3) and (4) respectively). [4]

(3) On these grounds I regret that I am unable to concur with the Court in its present judgment. (GloWbE-AmE, worldcourts.com)
(4) as many people who regret not taking better care of their teeth could tell you! (GloWbE-AmE, dumblittleman.com)

(ii) variety

This predictor has four values: American English (AmE), British English (BrE), Hong Kong English (HKE), and Nigerian English (NigE). The expectations for this predictor are that non-native or L2 varieties of English, that is, HKE and NigE, will show a stronger tendency for that/zero-complement clauses due to the cognitive effects derived from the language contact and SLA settings in which they take place, and the influence of their substrate languages (Thomason 2001, 2008; Steger & Schneider 2012; Schneider 2013).

(iii) text_type

Two text types occur in this corpus: Blogs, accounting for about 60% of the corpus, and General, for the remaining 40% of the corpus, which contains general internet webpages but has also been found to contain some blogs (around 20%; Davies & Fuchs 2015: 2–3). Authors such as Grieve et al. (2010), Davies and Fuchs (2015) and Hoffman (2018) argue that blogs are somewhat more informal and use a more informal and speech-based language. However, Loureiro-Porto (2017: 460) questions this claim, since in her analysis she found no significant differences in terms of orality and informality between the two text types in GloWbE. This issue will be tested, and the values for this predictor are therefore: General or Blogs.

(iv) cc_verbal_meaning

This predictor refers to whether the verb in the complement clause is an action or state verb (cf. examples (5) and (6) respectively). Before the gerund developed its verbal features in the 18th century, it was only used with predicates expressing actions. However, from Late Modern English onwards, both action and state predicates began to take gerundial forms (Heyvaert & Cuyckens 2010: 139). Heyvaert and Cuyckens (2010: 143) found that, even though with the verb regret the gerund was used with both state and action predicates, the preference has always been for action verbs; in their data, action predicates with regret account for 65.8% in the period from 1710–1920 in the CLMET corpus and 84.3% in the 1990s in the COBUILD corpus.

(5) Maybe he does regret hurting you and will never do it again, who knows? (GloWbE-HKE, hongkong.asiaxpat.com))
(6) I only regret that I have but one life to lose for my country. (GloWbE-NigE, nigeriaworld.com)

(v) cc_animacy

This predictor has two possible values: animate (7) or inanimate (8). Heyvaert and Cuyckens (2010: 144) find that that/zero-complement clauses “are clearly in favour of inanimate subjects.” The expectations therefore are that if the subject in the complement clause is inanimate, declarative finite that/zero-complement clauses will be found. On the other hand, since gerunds are usually subject controlled by the main clause and the verb regret in the main clause requires an animate subject, I expect that gerunds will be more frequent if the subject is animate. In (7) the subject of the main clause is animate and so is the subject of the complement clause.

(7) Katie Harrison from Herefordshire regrets missing out on the windfall of state funding. (GloWbE-BrE, guardian.co.uk)
(8) I deeply regret that this incident happened at all. (GloWbE-AmE, huffingtonpost.com)

(vi) coreferentiality

Cuyckens et al.’s article (2014: 191) refers to this predictor as “denotation” and it has two possible outcomes: coreferential, when the subjects of the main clause and the complement clause denote the same entity, as in example (9), or non-coreferential, when the two subjects denote different entities, as in example (10). Heyvaert and Cuyckens (2010: 144) find a specialization of that/zero-complement clauses for different subject, i.e. non-coreferential, therefore I expect that non-coreferential subjects will continue to favor the use of declarative finite that/zero-complement clauses.

(9) I regret fighting against the future that was Biafra, during the Nigeria-Biafra civil war. (GloWbE-NigE, meniru.blogspot.com)
(10) we regret very much that others have to end in loss like us. (GloWbE-HKE, emoneyspace.com)

(vii) temporal_relation

The temporal relation between the time expressed by the verb in the main clause and that expressed by the verb in the complement clause can be anterior (11) or simultaneous (12) (Quirk et al. 1985: 1183; Huddleston & Pullum 2002: 160, 1243; Noonan 2007: 110–114). Due to the specialization of the gerund to express anterior meanings (Vosberg 2003: 200), I expect that declarative finite that/zero-complement clauses will be favored in simultaneous environments.

(11) I regret going broke.(GloWbE-BrE, jamesaltucher.com)
(12) Suggests that he regrets that the First Amendment at the present time doesn't enable him to ban the film. (GloWbE-AmE, israpundit.com)

The remaining predictors (from viii to xi) are related to the Complexity Principle (Rohdenburg 1996: 151; 2006: 147; cf. Section 1). According to this principle, some factors such as “discontinuous constructions of various kinds, passive constructions, and the length of subjects, objects, and subordinate clauses” (Rohdenburg 1996: 149) increase cognitive processing complexity and therefore tend to take the more explicit construction; in our case, this would be declarative finite that/zero-complement clauses.

(viii) cc_voice

Two outcomes are distinguished: active and passive (examples (13) and (14) below).

(13) She regretted not studying for a qualification in accountancy, a subject she enjoyed, when she was younger - something that would have boosted her career. (GloWbE-HKE, yp.scmp.com)
(14) We regret that a member of the public was injured by tiles falling from a pillar. (GloWbE-BrE, ...xbridgegazette.co.uk)

(ix) cc_negation

This predictor has the following two values: positive and negative (cf. examples (15) and (16) respectively).

(15) Looking back now, I regret complaining about the canteen food during my stint as a cadet at Young Post this summer. (GloWbE-HKE, yp.scmp.com)
(16) I regret that I haven't kept a running record of this outside of their charts. (GloWbE-AmE, sciencebasedmedicine.org)

(x) cc_words_constituents

This indicates the total number of orthographic words of the postverbal constituents in the complement clause. For instance, in example (17) below there would be 6 words, namely somehow, and into it for 5 minutes.

(xi) intervening_material

This indicates the number of words between the verb regret and the first word of the complement clause, as in example (17).

(17) … and I regret, deep down in my soul, that I somehow got sucked into it for 5 minutes. (GloWbE-HKE, crazybee.net)


The analysis of the data also considers the possible influence of substrate languages, which will be described in this subsection, together with the cognitive effects derived from the language contact and SLA situation in which the L2 varieties under study emerged as potential predictors (cf. Section 1). However, these two potential predictors or explanations are not included in the statistical analysis, but rather are considered in a more descriptive manner (cf. Section 3.3).

The major substrate language in Hong Kong is Cantonese. According to Matthews and Yip (1994: 174, 293) “there is no infinitive form in Cantonese, and arguably no distinction between finite and non-finite verbs.” They argue that subordination in Cantonese is constructed through parataxis, that is, the juxtaposition of two clauses (cf. examples (18) and (19) below).

(18) Ngô jîu jow kôy láy tónk nêy jow hai [5]
I call he come with you make shoe
“I’ll ask him to come and make shoes for you” (Killingley 1993: 44)
(19) Síu Yìuh wah mh séung làih wóh
Little Yiu say not want come PRT
“Yiu says she doesn’t want to come” (Matthews & Yip 1994: 308)

Regarding Nigeria, there are three major indigenous substrate languages, Hausa, Igbo, and Yoruba, as well as one language which is exogenous to the country, French (Ogunmodimu 2015: 156).

  1. Hausa expresses complementation either with that/zero-complement clauses using the complementizer cêwā, see example (20), or with to-infinitives (Newman 2000: 97–98; Jaggar 2001: 545–591). No gerund forms are available in this language.
(20) mun daukā mat cêwā kā yàrda
lpl.PF assume that 2m.PF agree
“We assume that you agree” (Jaggar 2001: 579)
  1. Igbo takes the complementizer to form that/zero-complement clauses (cf. example (21); Emenanjo 1987: 88; 2015: 337). As for the non-finite patterns, gerunds are considered nominals and “never contain any inflectional affixes” (Emenanjo 2015: 222), and hence they cannot be compared to English gerunds which have acquired verbal features over the centuries (cf. Fanego 1996c, among others).
(21) Ha mààrà Àda ālụọla dī
“They know that Ada has a husband” (Emenanjo 2015: 337)
  1. Yoruba forms complementation with that/zero-complement clauses using the complementizers and (cf. examples (22) and (23); Adesola 2015: 11–12; Sheehan & van der Wal 2016: 355). As for non-finite clauses, only the infinitive is possible with the marker láti (Adesola 2015: 8; Sheehan & van der Wal 2016: 352).
(22) Olú so Adé rí bàbá òun
Olu say that Ade see father him
“Olu said that Ade saw his father” (Adesola 2015: 11)
(23) Olú gbà Adé rí bàbá òun
Olu accept that Ade see father him
“Olu agreed that Ade should see his father” (Adesola 2015: 12)
  1. French can only form complement clauses through the complementizer que, as in example (24), or with the infinitive. Gerund forms are not available (Hansen 2016: 60, 151).
(24) Je veux absolument que tu viennes!
I want absolutely that you come
“I absolutely want you to come!”  (Hansen 2016: 60)


3.1 Overview of results and diachronic evolution of the verb regret

Figure 1 shows the evolution of the distribution of that/zero-complement clauses and gerunds with the verb regret from 1418 to present-day British English, as based on Heyvaert and Cuyckens (2010) and my own data from the GloWbE. As can be seen, in the corpora analyzed by Heyvaert and Cuyckens (2010) for the 15th century to the end of the 17th century, there are no attestations of the use of this verb. [6] From the 18th century to the beginning of 20th century, the preference was for that/zero-complement clauses (76.8%), and at the end of the 20th century, in the 1990s, this preference changed towards a slight predominance of gerunds (57.6%). The data from the present study shows that this predominance of the gerund construction increased and continues in the early 21st century (69.1% of the attestations in the GloWbE corpus are gerunds).

Figure 1. Evolution of the complementation system of the verb REGRET from Early Modern British English to present-day British English.

Figure 1. Evolution of the complementation system of the verb regret from Early Modern British English to present-day British English.

The diachronic increase in the use of gerunds at the expense of that/zero-complement clauses has been widely studied together with other notable shifts in the English complementation system, a phenomenon which has come to be known as the Great Complement Shift (cf. Warner 1982; Fischer 1988, 1989; Fanego 1990, 1992, 1996a, 1996b, 1996c, 1998, 2004a, 2004b, 2010, 2016; Rohdenburg 1995, 2006, 2014; Rudanko 1998, 2000, 2011; Miller 2002; Los 2005; Vosberg 2006; De Smet 2008, 2009, 2010, 2013, 2014; among others). This increase in the use of the gerund is partly due to it acquiring full verbal properties “from Middle English onwards” (Fanego 1996c: 72). Fanego (1996c: 72) describes these verbal features as follows:

  1. it became capable of governing an object or a predicative complement;
  2. it could be modified by adverbial adjuncts restricted to co-occurring only with verbs;
  3. it showed tense and voice distinctions;
  4. it could take a subject in a case other than the genitive.

From the group of retrospective verbs to which the verb regret belongs, the most widely studied is the verb remember (Fanego 1996c, 1996d, 2004a, 2007; Mair 2006; De Smet & Cuyckens 2007; Rohdenburg 2016). Given that remember was the first retrospective verb to take infinitival and gerundial sentential complements, Fanego suggests that “it must be seen as the central member of the class, and as the one that eventually came to set the pattern for all the others” (1996c: 74; cf. also Fanego 2007). Table 1 below shows the evolution of the distribution of that/zero-complement clauses and gerunds from 1710 to present-day English (2012–2013) of both verbs, regret and remember. As can be seen, the combination remember + gerund shows an increase in use during the Late Modern English Period; from 49.1% to 76.0% (Cuyckens et al. 2014: 192) and this predominance has been maintained into present-day English, with 74.4% (García Castro 2018). [7] On the contrary, it seems that the verb regret did not begin to experience this increase in the use of gerunds until the 1990s (ranging from around 50% in Late Modern English to 57.6% in the 1990s). In present-day English this percentage is higher (69.1%) and thus is similar to that for remember. The data seems to confirm Fanego’s (1996c) suggestion that the verb remember may have “set the pattern” for the other retrospective verbs, in this case regret.

  regret remember
that -ing that -ing
  n % n % n % n %
1710–1780 (Cuyckens et al. 2014) 8 66.7 4 33.3 462 50.9 446 49.1
1781–1920 (Cuyckens et al. 2014) 192 71.9 75 28.1 696 24.0 2,206 76.0
1990s (Heyvaert & Cuyckens 2010) 108 42.4 147 57.6 NA NA NA NA
2012–2013 108 30.9 403 69.1 138 25.6 402 74.4

Table 1. Evolution of the complementation profiles of the verbs regret and remember in British English from 1710 to 2013.


Table 2 below summarizes the predictors included in the regression analysis, as well as an initial overview of their distribution and the coding process. Figure 2, below, sets out the different charts and histograms for the distribution of each predictor.

Predictor Distribution Coding
variety N = 583 BrE (37%); N = 525 AmE (33%); N = 138 HKE (9%); N = 333 NigE (21%) Categorical: BrE = reference level
text_type N = 853 General (54%);
N = 726 Blogs (46%)
Binary: General = 1; Blogs = 0
cc_verbal_meaning N = 1325 action (84%);
N = 254 state (16%)
Binary: action = 1; state = 0
cc_animacy N = 181 inanimate (11%);
N = 1398 animate (89%)
Binary: inanimate = 1; animate = 0
coreferentiality N = 345 non-coreferential (21%);
N = 1234 coreferential (78%)
Binary: non-coreferential = 1; coreferential = 0
temporal_relation N = 250 simultaneous (16%);
N = 1329 anterior (84%)
Binary: simultaneous = 1; anterior = 0
cc_voice N = 90 passive (6%);
N = 1489 active (94%)
Binary: passive = 1; active = 0
cc_negation N = 475 negative (30%);
N = 1104 positive (70%)
Binary: negative = 1; positive = 0
cc_words_constituents Mean = 6.3; Min = 0; Max = 81 Quantitative, logarithmically transformed with natural base; values of zero were assigned a score of -1 (log_cc_words_constituents)
intervening_material Mean = 0.1; Min = 0; Max = 9 Quantitative, logarithmically transformed with natural base; values of zero were assigned a score of -1 (log_intervening_material)

Table 2. Descriptive statistics of the predictors and their coding.

Figure 2. Distribution of the predictors.
The data was analyzed with a binary logistic regression model using the “glm” function in R (Gelman & Hill 2007). [8] This model is applied for the purposes of effect estimation and hypothesis testing, rather than attempting to predict the variation as whole (Harrell 2015: 98–99). With this in mind, I included all the predictors known to play a role in the variation defined in Section 2.1, and calculated predicted percentages. In order to reflect the expected patterns, the predictors are coded so that number 1 favors that/zero-complement clauses and should return a positive estimate; that is, positive coefficients indicate consistency with theoretical expectations.

Table 3 presents the results from the binary logistic regression model. The value in the column “coefficient” indicates the strength of each predictor on the log odds scale. Positive numbers represent an increase in the probability of producing a declarative finite that/zero-complement clause, while negative numbers represent a decrease in the probability. The larger the number, the stronger the effect of the specific predictor. The “standard error” refers to the accuracy of the estimate, that is, the level of uncertainty about the coefficient. The third column contains the “p-value” of each predictor which indicates the statistical significance; p-values smaller than 0.05 indicate statistical significance. The final column refers to the “odds ratio” (OR), which gives similar information to that of the coefficients. An OR above 1 indicates that that/zero-complement clauses are more likely to be used, while an OR between 0 and 1 indicates that gerund complement clauses are more likely to occur. An OR of 1 indicates that a predictor has no effect.

  coefficient std. error p-value   OR
(Intercept) - 3.22 0.29 < 2e-16 *** 0.04
variety (default: BrE)
variety: HKE
variety: NigE
variety: AmE
  **   2.26
text_type: General 0.19 0.17 0.25   1.21
cc_verbal_meaning: action 0.42 0.23 0.06   1.52
cc_animacy: inanimate 2.19 0.80 0.01 ** 8.97
coreferentiality: non_coreferential 4.02 0.34 < 2e-16 *** 55.52
temporal_relation: simultaneous 1.73 0.23 1.86e-13 *** 5.64
cc_voice: passive 0.74 0.51 0.15   2.09
cc_negation: negative 0.86 0.17 3.86e-07 *** 2.37
log_cc_words_constituents 0.59 0.08 4.56e-12 *** 1.80
log_intervening_material 0.13 0.20 0.50   1.14
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05

Table 3. Binary logistic regression model.

From the results obtained with the regression analysis summarized in Table 3, the predictors that significantly determine the variation are variety (HKE), cc_animacy, coreferentiality, temporal_relation, cc_negation, and log_cc_words_constituents. On the other hand, the alternation between declarative finite that/zero-complement clauses and gerunds does not appear to be conditioned by text_type, cc_verbal_meaning, cc_voice, and log_intervening_material.

As for the direction of the effects, the signs (positive numbers) of the first column (“coefficient”) as well as the OR (OR > 1) are all consistent with previous theories and hypotheses discussed in Section 2.1. That is, action verbs, inanimate subjects in the complement clause, non-coreferential subjects, simultaneous temporal relation, passives, negative particles, and an increase in the number of words of the complement clause and in the intervening material, all increase the likelihood of having a that/zero-complement clause.

Table 4 below shows the predicted percentages for each predictor, and Figure 3 is a graphic representation of these predicted percentages and the effect of each predictor in the model. [9] As expected, the stronger predictors are those that significantly determine the variation. The strongest predictor seems to be coreferentiality between the subjects in the main clause and complement clause. The use of non-coreferential subjects shows a higher proportion of declarative finite that/zero-complement clauses of 65 percentage points as compared to coreferential subjects. The next predictor, animacy of the subject in the complement clause, shows a difference of 44 percentage points between animate and inanimate subjects; inanimate subjects show a higher tendency for declarative finite that/zero-complement clauses. The complexity of the complement clause in number of words, and the presence of negative markers in the complement clause, have an effect strength of 41 and 21 percentage points respectively, both with a higher tendency for the finite patterns, this to be expected in light of the Complexity Principle (Rohdenburg 1996, 2006). Finally, the last two predictors that significantly determine variation are the temporal relation between the main clause and the complement clause with a percentage point difference of 39, and the predictor variety, which has a percentage-point difference of 20. It should be noted here that HKE in fact shows a stronger tendency towards declarative finite that/zero-complement clauses, at 63 percentage points, as compared with the L1 varieties (BrE with 43 and AmE 47 percentage points respectively). This stronger tendency for finite patterns is expected and discussed in detail in the following subsection and may be explained by substrate language and cognitive effects influence. By contrast, this is not the case in NigE, where the tendency towards declarative finite that/zero-complement clauses, at 46 percentage points, is indeed slightly lower than AmE’s effect strength of 47 percentage points.

Categorical variables
Comparison Percentage point difference
Non-coreferential (96%) vs. coreferential (31%) 65%
Inanimate (88%) vs. animate (44%) 44%
Simultaneous (8%) vs. anterior (41%) 39%
Negative (35%) vs. positive (56%) 21%
Variety: BrE (43%), HKE (63%), NigE (46%), AmE (47%) max – min = 63 – 43 = 20%
Passive (66%) vs. active (48%) 18%
Action (41%) vs. state (52%) 11%
General (52%) vs. Blogs (47%) 5%
Continuous variables
value (0) (35%) vs. value (3) (76%)
value (-1) (50%) vs. value (1) (56%)

Table 4. Effects of the predictors in the model in percentage points.

Figure 3. Effects of the predictors in the model.
As for the other predictors, even though they do not significantly determine the variation, the strength and direction of their effects also agree with previous theories and hypotheses. Firstly, both the use of the passive voice and the presence and increase in the number of words in the intervening material, although not statistically significant, show percentage point differences of 18 and 6 respectively. These environments show a stronger tendency towards the use of declarative finite that/zero-complement clauses, which is also in accordance with the Complexity Principle (Rohdenburg 1996, 2006). Secondly, the predictor cc_verbal_meaning has a percentage point difference between the two meanings (action and state) of 11, with state verbs having a stronger tendency for the use of finite patterns. If we look at the distribution of this predictor within the two available patterns in Figure 4, we see that gerunds have a clear preference for action verbs (90.9%) confirming previous studies with similar observations made (Heyvaert & Cuyckens 2010: 143).

Figure 4. Distribution of that/zero-complement clauses and gerunds with action and state verbs.

Figure 4. Distribution of that/zero-complement clauses and gerunds with action and state verbs.

Finally, the predictor with the weakest effect on the variation is text_type with a 5 percentage point difference. Even though we see a difference in the effect of both text types, this is not statistically significant, as Loureiro-Porto (2017) and others have also argued, which confirms our hypothesis that there are no significant differences between these two text types (cf. Section 2.1).


Figure 5 below shows the distribution of gerunds and that/zero-complement clauses with the verb regret in the four varieties: AmE, BrE, HKE, and NigE. As can be seen, gerunds predominate in all the varieties. However, the preference for this pattern is lower in the L2s (around 58% in HKE and 55% in NigE) than in the L1s (slightly higher than 69% in both cases). [10]

Figure 5. Distribution of gerunds and that/zero-complement clauses with the verb REGRET (cf. Appendix, Table 7).

Figure 5. Distribution of gerunds and that/zero-complement clauses with the verb regret (cf. Appendix, Table 7).

This lower frequency in the use of gerunds in the L2 varieties of English can be explained by different factors:

  1. As Kortmann (2006: 615) mentions, “the study of syntactic variation in non-standard varieties offers at the same time a look at the past and the future”. That is, it may be a historical trait inherited from their parent British English. Both Hong Kong and Nigeria became British colonies in the mid-19th century (1841 and 1861 respectively) and, as discussed in the previous section (3.1), the preference in British English at that time was for the use of declarative finite that/zero-complement clauses (around 70%). Therefore, these L2 varieties may have maintained the distribution of the two patterns to some extent. This maintenance of the older distribution of finite and non-finite patterns has been discussed in previous literature as “colonial lag”, term coined by Marckwardt (1958) and, more recently, as “extraterritorial conservatism”, a more neutral term (Hundt 2009: 32).
  2. The higher proportion of that/zero-complement clauses in the L2 varieties may also be due to a certain unfamiliarity with the verb regret, which is a low frequency verb (in the GloWbE corpus, the verb regret has a frequency of 17.5 per million words in BrE, and this frequency decreases to 13.32 in HKE). As Rohdenburg (1996: 160) points out, “less familiar verbs tend to involve greater processing burden, then the increased use of that with more formal verbs could perhaps be attributed -in part at least- to the complexity principle.”
  3. The difference in the distribution of that/zero-complement clauses in L1 and L2 varieties of English may be due to the cognitive effects derived from language contact and the SLA situations in which the L2 varieties emerge. The cognitive effect at work here has been referred to as transparency (Slobin 1980; Steger & Schneider 2012), maximization of transparency (Williams 1987), isomorphism (Steger & Schneider 2012; Schneider 2012; Green 2017), simplicity (Schneider 2012), and increased explicitness (Schneider 2013), all these alluding to the one-to-one mapping of form and meaning. In the complementation system, this would translate as a tendency for the use of declarative finite that/zero-complement clauses instead of gerunds, which is found in the L2 varieties.
  4. The difference in the distribution may be also due to the influence of substrate languages. Table 5 summarizes the main five substrate languages in Hong Kong and Nigeria (cf. Section 2.2 above for a more detailed description). On the one hand, Hausa, Igbo, Yoruba, and French (substrate languages of Nigeria) only have finite that/zero-complement clauses, which would explain the higher proportions of this structure in NigE as compared to the L1 varieties, in that it is the construction that speakers already know from their native languages. On the other hand, Cantonese does not have any type of clausal complementation. Since speakers do not have access to these types of complementation from the repertoires of their native languages, they may be more predisposed to acquire the uses and distribution of the L1 varieties, which could explain why in HKE the use of the gerund is slightly higher than in NigE. However, subordinate clauses headed with that are learned one year earlier than non-finite complements, according to the established cline of the first language acquisition of clause development (cf. Table 8 in the appendix; Green 2017: 173), because they are more grammatically integrated in the main clause among other reasons, and this may explain why the distribution of the two patterns in HKE is lower than in the L1 varieties.
  Hong Kong   Nigeria
  Cantonese   Hausa Igbo Yoruba French
that -  
-ing -   - - - -

Table 5. Summary of substrate languages.


This paper has addressed the complementation system of regret in different L1 (American and British English) and L2 varieties of English (Hong Kong English and Nigerian English), looking at the factors that influence the choice between declarative finite that/zero-complement clauses and gerunds in the language of the internet.

The analysis of the diachronic evolution of the complementation system of regret in British English shows that declarative finite that/zero-complement clauses used to be dominant until the beginning of the 20th century. From this period onwards, there has been a steady increase in the use of gerunds at the expense of finite that/zero-complement clauses, reaching in present-day English a distribution of 69.1% of gerunds and 30.9% of that/zero-complement clauses. This shift in the distribution of the two available complementation patterns is in accordance with the change experienced by the verb remember, the central member of the group of retrospective verbs, in the 18th century (cf. Section 3.1).

The statistical analysis of the semantic and syntactic predictors of the variation shows that the predictors variety (HKE), cc_animacy, coreferentiality, temporal_relation, cc_negation, and log_cc_words_constituents significantly determine the variation between declarative finite that/zero-complement clauses and gerunds. If we consider the direction of the effects of each predictor, the analysis also showed that the data is in line with previous theories and hypotheses here: (i) L2 varieties of English, especially HKE, show a higher tendency for the use of finite patterns; (ii) the difference between the available text types in the GloWbE corpus (General and Blogs) is not significant, as has been argued in previous studies (e.g. Loureiro-Porto 2017); (iii) gerunds show a clear preference for action verbs (90.9%) as argued by Heyvaert and Cuyckens (2010: 143); (iv) inanimate subjects in the complement clause, non-coreferential subjects between the two clauses, and simultaneous temporal relation, all show a higher proportion of finite patterns (cf. Quirk et al. 1985; Noonan 2007; Heyvaert & Cuyckens 2010); and (v) the passive voice, the presence of negative particles in the complement clause, an increase in the number of words in the complement clause and the intervening material, all increase the cognitive complexity of the structure and favor the use of finite patterns (cf. Rohdenburg 1996, 2006; cf. Section 3.2).

As for the comparison between the different varieties of English, the L2 varieties of English show a higher proportion of declarative finite that/zero-complement clauses as compared to the L1 varieties. This difference was tentatively explained in terms of (a combination of) three factors: i) it might be a historical trait inherited from British English at the time of colonization; ii) it might be caused by the relative unfamiliarity of the verb regret, since it is a low-frequency verb; iii) it might be the result of the influence of the cognitive effects derived from the language contact and SLA situation in which the L2 varieties of English have emerged (such as isomorphism and transparency); and iv) it might be explained in terms of transfer from the substrate languages (cf. Section 3.3).

In sum, the analysis of the complementation profile of regret confirms that the English complementation system is in fact an area of variability in WEs, as mentioned in previous studies (cf. Section 1), which deserves more scholarly attention. Ideally, as I attempted to do here, studies should try to combine statistical analyses of measurable syntactic and semantic factors with descriptive accounts of potential historical and cognitive factors, as a means of achieving a more holistic approach to the development of World Englishes.


I am very grateful to Lukas Sönning for his great help with the statistical analysis, to Elena Seoane for her comments on an earlier draft of this paper, and to two anonymous reviewers for their valuable comments. For funding, my gratitude goes to the Spanish Ministry of Economy and Competitiveness (grant FFI2017-82162-P) and the University of Vigo.


[1] The retrospective and prospective labels are taken from De Smet (2010). [Go back up]

[2] Potential problems with the GloWbE corpus have been reported in previous works (see for example Davies & Fuchs 2015; Mukherjee 2015; Hoffmann 2018). Some of the backdrops highlighted are the incorporation of texts from the comments sections of newspapers, with the subsequent impact on the impossibility to ascertain the country of origin of the writers despite the careful selection of webpages through the domains (.lk for Sri Lanka and .sg for Singapore, for example; Davies & Fuchs 2015: 26; Hoffmann 2018: 179) and the difficulty to know the particular variety used by the writer, that is, acrolectal, mesolectal, or basilectal (Mukherjee 2015: 35). However, previous research using the ICE corpus and GloWbE obtain similar results (see for example Heller & Röthlisberger 2015). Use of GloWbE here is necessary due to its size. An exploratory search in the British component of the International Corpus of English (ICE, Greenbaum 1996) retrieved only 14 examples. [Go back up]

[3] False positives are examples in which regret is used as a noun or adjective, and examples that are not internet sources. The presence of such examples led me to manually analyze precision and recall. The rates for the four varieties were above 90% in both precision and recall with the exception of British English, in which recall which was 89.2% (cf. Table 6 in the Appendix). On the other hand, invalid examples are repeated, incomplete, and unintelligible examples. [Go back up]

[4] As the reviewers of this article pointed out, that- and zero-complement clauses differ in terms of their cognitive complexity and could be treated as two separate variants. However, the low frequency of use of zero-complement clauses in the corpus (only 87 examples were retrieved for all the varieties under study) made it necessary to merge these two variants of finite clauses under one connecting label, that is, that/zero-complement clause. Moreover, other scholars studying the competition between finite and non-finite patterns also group these variants together (e.g. Rohdenburg 1996, 1999, 2015; Cuyckens et al. 2014; Cuyckens & D’hoedt 2015). [Go back up]

[5] The original transcription in Killingley (1993: 44) is “ngo3 giw3 koey3 lay5 tung5 ney3 jow4 haay4”, which we transformed by using a script available on GitHub. [Go back up]

[6] According to the OED, the first attestation of the use of regret as a verb in English is in the poem “Pearl” in the late 14th century; Art þou my perle þat I haf playned, Regretted by myn one, on nyȝte? [Go back up]

[7] The present-day data for the verb remember (GloWbE 2012–2013) is a modified version of the data in García Castro (2018), a study in which a random sample of 3,000 instances is taken as a means of distinguishing between declarative finite that/zero-complement clauses and non-finite complement clauses (to-infinitives and gerunds). Hence, the to-infinitives had to be excluded for the present study. [Go back up]

[8] The full R command is: glm(that ~ variety_factor + general + action + inanimate + non_coreferential + simultaneous + passive + negative, log_words_constituents + log_intervening_material, data=REGRET, family=binomial('logit')) [Go back up]

[9] Predicted percentages are calculated by defining an average condition and then holding each predictor at a low and a high value. The differences between these two values are the predicted percentages. [Go back up]

[10] I am aware that some speakers of NigE and HKE may be L1 speakers of these varieties, since they are in phases 4 and 3 of Schneider’s (2007) Dynamic Model and some members of the youngest generations may have learnt English as an L1. Since these cases are not the general rule, I continue to use the generalization L2 speakers for these. [Go back up]


GloWbE = Corpus of Global Web-based English. 2013. Compiled by Mark Davies. https://corpus.byu.edu/glowbe/

ICE = International Corpus of English. http://ice-corpora.net/ice/index.html

OED = Oxford English Dictionary (online edition). http://www.oed.com/public/online/about-oed-online#ViewingSecondEdition

Script used for transforming Killingley’s transcription: https://github.com/kfcd/pingyam/blob/master/pingyambiu


  Verbs retrieved of total % recall regret*_v* vs. false positives % precision
American English 111 of 117 94.9 5466/304 94.7
British English 99 of 111 89.2 5153/298 95.4
Hong Kong English 90 of 93 96.8 485/40 92.4
Nigerian English 105 of 111 94.6 1082/44 96.1

Table 6. Precision and recall.

  AmE BrE HKE NigE
n % n % n % n %
that/zero 160 30.5 180 30.9 58 42.0 149 44.7
-ing 365 69.5 403 69.1 80 58.0 184 55.3
Total 525 100 583 100 138 100 333 100

Table 7. Distribution of gerunds and that/zero-complement clauses with the verb regret in American, British, Hong Kong, and Nigerian Englishes.

Age Clause structure
2 Paratactic, no connector
2; 5 Coordinate connectives: and, and then
2; 10 Subordinates headed by: if, when, while, so, because, before
3 Subordinates headed by: how, where, that
4 Non-finite complements (subjectless and non-catenative types)
5-6 Higher consistent frequency of complex clause (e.g. infinitival complements)

Table 8. First language acquisition clause development (Green 2017: 173).

University of Helsinki