Studies in Variation, Contacts and Change in English
Verb complementation is one of the areas where variability and change in indigenized L2 varieties of English is frequently observed. As such, it has been addressed from a semantic and pragmatic point of view, and this study does likewise. Here the focus is the complementation profile of the verb regret, a verb which has historically shown non-categorical variation between declarative finite that/zero-complement clauses and gerunds (e.g. I regret that I said that / saying that). The database comprises four different varieties of English (American English, British English, Hong Kong English, and Nigerian English) as represented in the Corpus of Global Web-Based English (GloWbE).
An analysis of the distribution of the two available patterns in the English varieties and the different substrate languages (Cantonese in Hong Kong English, and Hausa, Igbo, Yoruba and French in Nigerian English) suggests that both cognitive effects derived from language contact situations and second language acquisition processes (mainly increased isomorphism and transparency) and influence of substrate languages serve as possible explanations for the higher proportions of declarative finite that/zero-complement clauses in the L2 varieties here. The binary logistic regression analysis of other intra- and extra-linguistic factors drawn from the literature shows that the choice of declarative finite that/zero-complement clauses is determined by factors such as inanimate and non-coreferential subjects, presence of negative markers, passive voice, action verbs, text type General, simultaneous temporal relation, and an increase in the number of words in the complement clause and in the intervening material between the two clauses.
The diachronic evolution of the English complementation system has received a great deal of attention (Warner 1982; Fischer 1988; Rohdenburg 1995, 2006; Fanego 1996a, 1996b, 2004a, 2007; Rudanko 1999, 2000; Vosberg 2003; De Smet 2013; Rickman & Rudanko 2018); some studies in the field of World Englishes (WEs), especially those concerned with ditransitive verbs and the competition between infinitives and gerunds, have also considered it an area of innovation and change (Olavarria de Ersson & Shaw 2003; Mukherjee & Hoffmann 2006; Mukherjee & Schilk 2008; Mukherjee & Gries 2009; Schilk et al. 2012, 2013; Bernaisch 2013; Nam et al. 2013; Deshors 2015; Gries & Bernaisch 2016; Deshors & Gries 2016). For example, Schneider (2007: 86) states that, “a classic example [of innovation in varieties in stage 3, nativization] is the complementation patterns which verbs and also adjectives typically enter.” These studies focus mainly on the semantic and syntactic factors that influence the alternation between different complementation patterns, as I do in Section 3.2. However, they have not looked in detail at the potential effects of the processes derived from the language contact situation (such as the impact of the substrate languages in the speaker’s use of English, see Section 2.2 for a more detailed analysis of the complementation systems of the substrate languages) nor have they focused on the fact that these varieties of English emerged and developed as L2s and may as such be subject to Second Language Acquisition (SLA) effects as well. Some linguistic features which are attested in many geographically distant varieties of English cannot be explained in terms of language-contact and may therefore be the result of the L2 learning processes (Mesthrie & Bhatt 2008: 159). One of these processes is the preference or tendency towards simplicity in the sense of “a propensity for transparency” (also known as isomorphism or iconicity; Steger & Schneider 2012: 156), which is defined as the one-to-one mapping of form and meaning. Regarding the complementation system, it has been proved that indigenized L2 varieties of English show a preference for finite patterns over non-finite structures, that is, a preference for more explicit forms (Steger & Schneider 2012: 172; cf. also Thomason 2008; Schneider 2012, 2013). Finite complement clauses are said to be more transparent and isomorphic than the non-finite alternative since they are explicitly marked for “tense and agreement, modality, and a complementizer”, and thus are easier to process (Givón 1985: 200; Steger & Schneider 2012: 165). This tendency towards isomorphism and the use of more explicit forms, proper to SLA situations, is also captured by the Complexity Principle (Rohdenburg 1996, 2006), which is claimed to be universal to language and states that,
Rohdenburg (1996, 2006) argues that these “cognitively more complex environments” include, e.g., the presence of negative markers, the use of the passive voice, and the presence of intervening material between the main clause and the complement clause (CC), among others.
The present study attempts to integrate cognitive factors pertaining to the language contact settings in which these languages develop (which involve the potential influence of substrate languages and SLA processes) in the description of the variation found in the complementation profile of the retrospective verb regret in present-day English. Together with cognitive factors, I will also conduct a regression analysis of a number of syntactic and semantic factors shown to impact the alternation between the available complementation patterns (e.g. complexity of the complement clause, intervening material, subject animacy of the complement clause, voice of the verb in the complement clause; cf. Cuyckens et al. 2014). The variability witnessed in the diachronic evolution of the complementation of regret as shown in at least two studies, those of Heyvaert and Cuyckens (2010) and Cuyckens et al. (2014), makes this verb a very good candidate for research in WEs, as explained further below.
The database for this analysis covers four varieties of English, two L1 or inner circle varieties, American English (AmE) and British English (BrE), chosen as reference varieties, and two L2 or outer circle varieties, Hong Kong English (HKE), and Nigerian English (NigE). These two outer circle varieties were singled out because (i) they are in stage 3 within Schneider’s Dynamic Model (Schneider 2007), the stage in which most grammatical innovations are expected to occur (Schneider 2007: 86), and (ii) they are two historically and geographically unrelated varieties with different substrate languages. The language studied here is that of the internet (Corpus of Global Web-Based English (GloWbE); Davies 2013).
regret exhibits a categorical alternation between gerundial (-ing) and to-infinitival sentential complements: the gerund has an anterior or retrospective meaning, that is, it “refers to a preceding event or occasion coming to mind at the time indicated by the main verb” (Quirk et al. 1985: 1193), while the infinitive has a prospective meaning, in that it “indicates that the action or event takes place after (and as a result of) the mental process indicated by the verb has begun” (Quirk et al. 1985: 1193; cf. examples (1) and (2) respectively).  In terms of the expression of prospective meaning (see example (2) below), no variation is detected in my corpus sample: the prospective meaning is always expressed by to-infinitive complement clauses. However, there is alternation between declarative finite that/zero-complement clauses and gerunds with a retrospective meaning, as might be expected, since it is shown to be “less categorical” in Cuyckens et al. (2014: 182). In other words, both constructions seem to be freely interchangeable, as in the two examples in (1).
|(1)||I regret telling you that John stole it. [“I regret that I told you that John stole it” or “…that I am now telling you…”] (Quirk et al. 1985: 1193)|
|(2)||I regret to tell you that John stole it. [“I regret that I am about to tell you that John stole it”] (Quirk et al. 1985: 1193)|
From a diachronic perspective, Heyvaert and Cuyckens (2010) examine the verb regret together with resent, be sorry, admit and agree, from Early Modern British English to present-day British English, and find an increase in the use of gerunds since the Late Modern British English period to the 1990s (gerunds range from 23.2% in the period 1710–1920 to 57.6% in the 1990s). Cuyckens et al. (2014), on the other hand, consider regret together with remember and deny in Late Modern British English and, by contrast, find a slight decrease in the use of gerunds during this period (gerunds range from 33.3% in 1710–1780 to 28.1% in 1781–1920). The present-day English data presented here will reveal whether the current trend is towards an increasing or decreasing use of gerunds.
From a synchronic perspective, on the other hand, only one study is available (cf. Romasanta 2017). In this article, the author studies the present-day complementation profile of regret in five varieties of English (AmE, BrE, JamE (Jamaican English), HKE, and NigE) with a special focus on the distribution of finite and non-finite patterns and the possible influence of substrate languages and cognitive processes derived from the SLA and language contact situation in which the non-native varieties of English emerged. In general terms, the main conclusions in this study are that (i) there is a higher proportion of non-finite patterns than finite patterns in all the varieties studied, but specially so in the native varieties (the proportion of non-finite patterns in each variety is as follows: 68.69% in AmE/BrE, 65.4% in JamE, 61.1% in HKE, and 53.4% in NigE), and (ii) both substrate influence, and hyperclarity and isomorphism, seem to be important factors determining the relatively lower proportions of non-finite patterns in the non-native varieties. In view of these findings, the present article will, as explained above, dig deeper into these differences between inner and outer circle varieties by including new semantic and syntactic factors that may determine the choice between finite and non-finite forms.
The outline of this paper is as follows. Section 2 describes the methodology and the varieties chosen for the study, including their substrate languages. Section 3 deals with the data analysis and is divided into three subsections: subsection 3.1 offers a brief overview of the data analyzed and a comparison with previous studies; subsection 3.2 discusses the statistical analysis and results of the intra- and extra-linguistic predictors influencing the speaker’s choice between the available patterns; and subsection 3.3 analyzes the possible impact of the cognitive effects previously mentioned (essentially isomorphism and transparency) as well as that of the substrate languages. Finally, Section 4 offers a summary of the findings.
The data for this study, as noted in the introductory section, is taken from the GloWbE corpus, which comprises some 1.9 million words from 20 different varieties of English.  As already mentioned, the database for the present study covers four varieties of English (AmE, BrE, HKE and NigE). I ran a general search for “regret” as a verb lemma (regret*_v*) using the online interface. For the outer circle varieties, I retrieved all the examples available in the corpus (a total of around 1,700). For the inner circle varieties, because the number of examples available in the corpus was very high, I took a random sample of 2,000 examples for each variety.
After manual pruning of false positives and invalid examples and a preliminary analysis of the complementation patterns, the final number of valid examples that exhibit the alternation between that/zero-complement clauses and gerunds with retrospective and simultaneous meanings is 1,579 for all four varieties. All these tokens were then coded for a number of semantic and syntactic predictors. 
The statistical analysis takes into consideration the following factors:
This is the dependent variable, and two values are distinguished: that/zero-complement clauses or gerund (as in examples (3) and (4) respectively). 
|(3)||On these grounds I regret that I am unable to concur with the Court in its present judgment. (GloWbE-AmE, worldcourts.com)|
|(4)||as many people who regret not taking better care of their teeth could tell you! (GloWbE-AmE, dumblittleman.com)|
This predictor has four values: American English (AmE), British English (BrE), Hong Kong English (HKE), and Nigerian English (NigE). The expectations for this predictor are that non-native or L2 varieties of English, that is, HKE and NigE, will show a stronger tendency for that/zero-complement clauses due to the cognitive effects derived from the language contact and SLA settings in which they take place, and the influence of their substrate languages (Thomason 2001, 2008; Steger & Schneider 2012; Schneider 2013).
Two text types occur in this corpus: Blogs, accounting for about 60% of the corpus, and General, for the remaining 40% of the corpus, which contains general internet webpages but has also been found to contain some blogs (around 20%; Davies & Fuchs 2015: 2–3). Authors such as Grieve et al. (2010), Davies and Fuchs (2015) and Hoffman (2018) argue that blogs are somewhat more informal and use a more informal and speech-based language. However, Loureiro-Porto (2017: 460) questions this claim, since in her analysis she found no significant differences in terms of orality and informality between the two text types in GloWbE. This issue will be tested, and the values for this predictor are therefore: General or Blogs.
This predictor refers to whether the verb in the complement clause is an action or state verb (cf. examples (5) and (6) respectively). Before the gerund developed its verbal features in the 18th century, it was only used with predicates expressing actions. However, from Late Modern English onwards, both action and state predicates began to take gerundial forms (Heyvaert & Cuyckens 2010: 139). Heyvaert and Cuyckens (2010: 143) found that, even though with the verb regret the gerund was used with both state and action predicates, the preference has always been for action verbs; in their data, action predicates with regret account for 65.8% in the period from 1710–1920 in the CLMET corpus and 84.3% in the 1990s in the COBUILD corpus.
|(5)||Maybe he does regret hurting you and will never do it again, who knows? (GloWbE-HKE, hongkong.asiaxpat.com))|
|(6)||I only regret that I have but one life to lose for my country. (GloWbE-NigE, nigeriaworld.com)|
This predictor has two possible values: animate (7) or inanimate (8). Heyvaert and Cuyckens (2010: 144) find that that/zero-complement clauses “are clearly in favour of inanimate subjects.” The expectations therefore are that if the subject in the complement clause is inanimate, declarative finite that/zero-complement clauses will be found. On the other hand, since gerunds are usually subject controlled by the main clause and the verb regret in the main clause requires an animate subject, I expect that gerunds will be more frequent if the subject is animate. In (7) the subject of the main clause is animate and so is the subject of the complement clause.
|(7)||Katie Harrison from Herefordshire regrets missing out on the windfall of state funding. (GloWbE-BrE, guardian.co.uk)|
Cuyckens et al.’s article (2014: 191) refers to this predictor as “denotation” and it has two possible outcomes: coreferential, when the subjects of the main clause and the complement clause denote the same entity, as in example (9), or non-coreferential, when the two subjects denote different entities, as in example (10). Heyvaert and Cuyckens (2010: 144) find a specialization of that/zero-complement clauses for different subject, i.e. non-coreferential, therefore I expect that non-coreferential subjects will continue to favor the use of declarative finite that/zero-complement clauses.
|(9)||I regret fighting against the future that was Biafra, during the Nigeria-Biafra civil war. (GloWbE-NigE, meniru.blogspot.com)|
The temporal relation between the time expressed by the verb in the main clause and that expressed by the verb in the complement clause can be anterior (11) or simultaneous (12) (Quirk et al. 1985: 1183; Huddleston & Pullum 2002: 160, 1243; Noonan 2007: 110–114). Due to the specialization of the gerund to express anterior meanings (Vosberg 2003: 200), I expect that declarative finite that/zero-complement clauses will be favored in simultaneous environments.
|(12)||Suggests that he regrets that the First Amendment at the present time doesn't enable him to ban the film. (GloWbE-AmE, israpundit.com)|
The remaining predictors (from viii to xi) are related to the Complexity Principle (Rohdenburg 1996: 151; 2006: 147; cf. Section 1). According to this principle, some factors such as “discontinuous constructions of various kinds, passive constructions, and the length of subjects, objects, and subordinate clauses” (Rohdenburg 1996: 149) increase cognitive processing complexity and therefore tend to take the more explicit construction; in our case, this would be declarative finite that/zero-complement clauses.
Two outcomes are distinguished: active and passive (examples (13) and (14) below).
|(13)||She regretted not studying for a qualification in accountancy, a subject she enjoyed, when she was younger - something that would have boosted her career. (GloWbE-HKE, yp.scmp.com)|
|(14)||We regret that a member of the public was injured by tiles falling from a pillar. (GloWbE-BrE, ...xbridgegazette.co.uk)|
This predictor has the following two values: positive and negative (cf. examples (15) and (16) respectively).
|(15)||Looking back now, I regret complaining about the canteen food during my stint as a cadet at Young Post this summer. (GloWbE-HKE, yp.scmp.com)|
|(16)||I regret that I haven't kept a running record of this outside of their charts. (GloWbE-AmE, sciencebasedmedicine.org)|
This indicates the total number of orthographic words of the postverbal constituents in the complement clause. For instance, in example (17) below there would be 6 words, namely somehow, and into it for 5 minutes.
This indicates the number of words between the verb regret and the first word of the complement clause, as in example (17).
|(17)||… and I regret, deep down in my soul, that I somehow got sucked into it for 5 minutes. (GloWbE-HKE, crazybee.net)|
The analysis of the data also considers the possible influence of substrate languages, which will be described in this subsection, together with the cognitive effects derived from the language contact and SLA situation in which the L2 varieties under study emerged as potential predictors (cf. Section 1). However, these two potential predictors or explanations are not included in the statistical analysis, but rather are considered in a more descriptive manner (cf. Section 3.3).
The major substrate language in Hong Kong is Cantonese. According to Matthews and Yip (1994: 174, 293) “there is no infinitive form in Cantonese, and arguably no distinction between finite and non-finite verbs.” They argue that subordination in Cantonese is constructed through parataxis, that is, the juxtaposition of two clauses (cf. examples (18) and (19) below).
|(18)||Ngô jîu jow kôy láy tónk nêy jow hai |
|I call he come with you make shoe|
|“I’ll ask him to come and make shoes for you” (Killingley 1993: 44)|
|(19)||Síu Yìuh wah mh séung làih wóh|
|Little Yiu say not want come PRT|
|“Yiu says she doesn’t want to come” (Matthews & Yip 1994: 308)|
Regarding Nigeria, there are three major indigenous substrate languages, Hausa, Igbo, and Yoruba, as well as one language which is exogenous to the country, French (Ogunmodimu 2015: 156).
|(20)||mun daukā mat cêwā kā yàrda|
|lpl.PF assume that 2m.PF agree|
|“We assume that you agree” (Jaggar 2001: 579)|
|(21)||Ha mààrà nà Àda ālụọla dī|
|“They know that Ada has a husband” (Emenanjo 2015: 337)|
|(22)||Olú so pé Adé rí bàbá òun|
|Olu say that Ade see father him|
|“Olu said that Ade saw his father” (Adesola 2015: 11)|
|(23)||Olú gbà kí Adé rí bàbá òun|
|Olu accept that Ade see father him|
|“Olu agreed that Ade should see his father” (Adesola 2015: 12)|
|(24)||Je veux absolument que tu viennes!|
|I want absolutely that you come|
|“I absolutely want you to come!” (Hansen 2016: 60)|
Figure 1 shows the evolution of the distribution of that/zero-complement clauses and gerunds with the verb regret from 1418 to present-day British English, as based on Heyvaert and Cuyckens (2010) and my own data from the GloWbE. As can be seen, in the corpora analyzed by Heyvaert and Cuyckens (2010) for the 15th century to the end of the 17th century, there are no attestations of the use of this verb.  From the 18th century to the beginning of 20th century, the preference was for that/zero-complement clauses (76.8%), and at the end of the 20th century, in the 1990s, this preference changed towards a slight predominance of gerunds (57.6%). The data from the present study shows that this predominance of the gerund construction increased and continues in the early 21st century (69.1% of the attestations in the GloWbE corpus are gerunds).
The diachronic increase in the use of gerunds at the expense of that/zero-complement clauses has been widely studied together with other notable shifts in the English complementation system, a phenomenon which has come to be known as the Great Complement Shift (cf. Warner 1982; Fischer 1988, 1989; Fanego 1990, 1992, 1996a, 1996b, 1996c, 1998, 2004a, 2004b, 2010, 2016; Rohdenburg 1995, 2006, 2014; Rudanko 1998, 2000, 2011; Miller 2002; Los 2005; Vosberg 2006; De Smet 2008, 2009, 2010, 2013, 2014; among others). This increase in the use of the gerund is partly due to it acquiring full verbal properties “from Middle English onwards” (Fanego 1996c: 72). Fanego (1996c: 72) describes these verbal features as follows:
From the group of retrospective verbs to which the verb regret belongs, the most widely studied is the verb remember (Fanego 1996c, 1996d, 2004a, 2007; Mair 2006; De Smet & Cuyckens 2007; Rohdenburg 2016). Given that remember was the first retrospective verb to take infinitival and gerundial sentential complements, Fanego suggests that “it must be seen as the central member of the class, and as the one that eventually came to set the pattern for all the others” (1996c: 74; cf. also Fanego 2007). Table 1 below shows the evolution of the distribution of that/zero-complement clauses and gerunds from 1710 to present-day English (2012–2013) of both verbs, regret and remember. As can be seen, the combination remember + gerund shows an increase in use during the Late Modern English Period; from 49.1% to 76.0% (Cuyckens et al. 2014: 192) and this predominance has been maintained into present-day English, with 74.4% (García Castro 2018).  On the contrary, it seems that the verb regret did not begin to experience this increase in the use of gerunds until the 1990s (ranging from around 50% in Late Modern English to 57.6% in the 1990s). In present-day English this percentage is higher (69.1%) and thus is similar to that for remember. The data seems to confirm Fanego’s (1996c) suggestion that the verb remember may have “set the pattern” for the other retrospective verbs, in this case regret.
|1710–1780 (Cuyckens et al. 2014)||8||66.7||4||33.3||462||50.9||446||49.1|
|1781–1920 (Cuyckens et al. 2014)||192||71.9||75||28.1||696||24.0||2,206||76.0|
|1990s (Heyvaert & Cuyckens 2010)||108||42.4||147||57.6||NA||NA||NA||NA|
Table 2 below summarizes the predictors included in the regression analysis, as well as an initial overview of their distribution and the coding process. Figure 2, below, sets out the different charts and histograms for the distribution of each predictor.
|variety||N = 583 BrE (37%); N = 525 AmE (33%); N = 138 HKE (9%); N = 333 NigE (21%)||Categorical: BrE = reference level|
|text_type||N = 853 General (54%);
N = 726 Blogs (46%)
|Binary: General = 1; Blogs = 0|
|cc_verbal_meaning||N = 1325 action (84%);
N = 254 state (16%)
|Binary: action = 1; state = 0|
|cc_animacy||N = 181 inanimate (11%);
N = 1398 animate (89%)
|Binary: inanimate = 1; animate = 0|
|coreferentiality||N = 345 non-coreferential (21%);
N = 1234 coreferential (78%)
|Binary: non-coreferential = 1; coreferential = 0|
|temporal_relation||N = 250 simultaneous (16%);
N = 1329 anterior (84%)
|Binary: simultaneous = 1; anterior = 0|
|cc_voice||N = 90 passive (6%);
N = 1489 active (94%)
|Binary: passive = 1; active = 0|
|cc_negation||N = 475 negative (30%);
N = 1104 positive (70%)
|Binary: negative = 1; positive = 0|
|cc_words_constituents||Mean = 6.3; Min = 0; Max = 81||Quantitative, logarithmically transformed with natural base; values of zero were assigned a score of -1 (log_cc_words_constituents)|
|intervening_material||Mean = 0.1; Min = 0; Max = 9||Quantitative, logarithmically transformed with natural base; values of zero were assigned a score of -1 (log_intervening_material)|
The data was analyzed with a binary logistic regression model using the “glm” function in R (Gelman & Hill 2007).  This model is applied for the purposes of effect estimation and hypothesis testing, rather than attempting to predict the variation as whole (Harrell 2015: 98–99). With this in mind, I included all the predictors known to play a role in the variation defined in Section 2.1, and calculated predicted percentages. In order to reflect the expected patterns, the predictors are coded so that number 1 favors that/zero-complement clauses and should return a positive estimate; that is, positive coefficients indicate consistency with theoretical expectations.
Table 3 presents the results from the binary logistic regression model. The value in the column “coefficient” indicates the strength of each predictor on the log odds scale. Positive numbers represent an increase in the probability of producing a declarative finite that/zero-complement clause, while negative numbers represent a decrease in the probability. The larger the number, the stronger the effect of the specific predictor. The “standard error” refers to the accuracy of the estimate, that is, the level of uncertainty about the coefficient. The third column contains the “p-value” of each predictor which indicates the statistical significance; p-values smaller than 0.05 indicate statistical significance. The final column refers to the “odds ratio” (OR), which gives similar information to that of the coefficients. An OR above 1 indicates that that/zero-complement clauses are more likely to be used, while an OR between 0 and 1 indicates that gerund complement clauses are more likely to occur. An OR of 1 indicates that a predictor has no effect.
|(Intercept)||- 3.22||0.29||< 2e-16||***||0.04|
|variety (default: BrE)
|coreferentiality: non_coreferential||4.02||0.34||< 2e-16||***||55.52|
|Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05|
From the results obtained with the regression analysis summarized in Table 3, the predictors that significantly determine the variation are variety (HKE), cc_animacy, coreferentiality, temporal_relation, cc_negation, and log_cc_words_constituents. On the other hand, the alternation between declarative finite that/zero-complement clauses and gerunds does not appear to be conditioned by text_type, cc_verbal_meaning, cc_voice, and log_intervening_material.
As for the direction of the effects, the signs (positive numbers) of the first column (“coefficient”) as well as the OR (OR > 1) are all consistent with previous theories and hypotheses discussed in Section 2.1. That is, action verbs, inanimate subjects in the complement clause, non-coreferential subjects, simultaneous temporal relation, passives, negative particles, and an increase in the number of words of the complement clause and in the intervening material, all increase the likelihood of having a that/zero-complement clause.
Table 4 below shows the predicted percentages for each predictor, and Figure 3 is a graphic representation of these predicted percentages and the effect of each predictor in the model.  As expected, the stronger predictors are those that significantly determine the variation. The strongest predictor seems to be coreferentiality between the subjects in the main clause and complement clause. The use of non-coreferential subjects shows a higher proportion of declarative finite that/zero-complement clauses of 65 percentage points as compared to coreferential subjects. The next predictor, animacy of the subject in the complement clause, shows a difference of 44 percentage points between animate and inanimate subjects; inanimate subjects show a higher tendency for declarative finite that/zero-complement clauses. The complexity of the complement clause in number of words, and the presence of negative markers in the complement clause, have an effect strength of 41 and 21 percentage points respectively, both with a higher tendency for the finite patterns, this to be expected in light of the Complexity Principle (Rohdenburg 1996, 2006). Finally, the last two predictors that significantly determine variation are the temporal relation between the main clause and the complement clause with a percentage point difference of 39, and the predictor variety, which has a percentage-point difference of 20. It should be noted here that HKE in fact shows a stronger tendency towards declarative finite that/zero-complement clauses, at 63 percentage points, as compared with the L1 varieties (BrE with 43 and AmE 47 percentage points respectively). This stronger tendency for finite patterns is expected and discussed in detail in the following subsection and may be explained by substrate language and cognitive effects influence. By contrast, this is not the case in NigE, where the tendency towards declarative finite that/zero-complement clauses, at 46 percentage points, is indeed slightly lower than AmE’s effect strength of 47 percentage points.
|Comparison||Percentage point difference|
|Non-coreferential (96%) vs. coreferential (31%)||65%|
|Inanimate (88%) vs. animate (44%)||44%|
|Simultaneous (8%) vs. anterior (41%)||39%|
|Negative (35%) vs. positive (56%)||21%|
|Variety: BrE (43%), HKE (63%), NigE (46%), AmE (47%)||max – min = 63 – 43 = 20%|
|Passive (66%) vs. active (48%)||18%|
|Action (41%) vs. state (52%)||11%|
|General (52%) vs. Blogs (47%)||5%|
value (0) (35%) vs. value (3) (76%)
value (-1) (50%) vs. value (1) (56%)
As for the other predictors, even though they do not significantly determine the variation, the strength and direction of their effects also agree with previous theories and hypotheses. Firstly, both the use of the passive voice and the presence and increase in the number of words in the intervening material, although not statistically significant, show percentage point differences of 18 and 6 respectively. These environments show a stronger tendency towards the use of declarative finite that/zero-complement clauses, which is also in accordance with the Complexity Principle (Rohdenburg 1996, 2006). Secondly, the predictor cc_verbal_meaning has a percentage point difference between the two meanings (action and state) of 11, with state verbs having a stronger tendency for the use of finite patterns. If we look at the distribution of this predictor within the two available patterns in Figure 4, we see that gerunds have a clear preference for action verbs (90.9%) confirming previous studies with similar observations made (Heyvaert & Cuyckens 2010: 143).
Finally, the predictor with the weakest effect on the variation is text_type with a 5 percentage point difference. Even though we see a difference in the effect of both text types, this is not statistically significant, as Loureiro-Porto (2017) and others have also argued, which confirms our hypothesis that there are no significant differences between these two text types (cf. Section 2.1).
Figure 5 below shows the distribution of gerunds and that/zero-complement clauses with the verb regret in the four varieties: AmE, BrE, HKE, and NigE. As can be seen, gerunds predominate in all the varieties. However, the preference for this pattern is lower in the L2s (around 58% in HKE and 55% in NigE) than in the L1s (slightly higher than 69% in both cases). 
This lower frequency in the use of gerunds in the L2 varieties of English can be explained by different factors:
This paper has addressed the complementation system of regret in different L1 (American and British English) and L2 varieties of English (Hong Kong English and Nigerian English), looking at the factors that influence the choice between declarative finite that/zero-complement clauses and gerunds in the language of the internet.
The analysis of the diachronic evolution of the complementation system of regret in British English shows that declarative finite that/zero-complement clauses used to be dominant until the beginning of the 20th century. From this period onwards, there has been a steady increase in the use of gerunds at the expense of finite that/zero-complement clauses, reaching in present-day English a distribution of 69.1% of gerunds and 30.9% of that/zero-complement clauses. This shift in the distribution of the two available complementation patterns is in accordance with the change experienced by the verb remember, the central member of the group of retrospective verbs, in the 18th century (cf. Section 3.1).
The statistical analysis of the semantic and syntactic predictors of the variation shows that the predictors variety (HKE), cc_animacy, coreferentiality, temporal_relation, cc_negation, and log_cc_words_constituents significantly determine the variation between declarative finite that/zero-complement clauses and gerunds. If we consider the direction of the effects of each predictor, the analysis also showed that the data is in line with previous theories and hypotheses here: (i) L2 varieties of English, especially HKE, show a higher tendency for the use of finite patterns; (ii) the difference between the available text types in the GloWbE corpus (General and Blogs) is not significant, as has been argued in previous studies (e.g. Loureiro-Porto 2017); (iii) gerunds show a clear preference for action verbs (90.9%) as argued by Heyvaert and Cuyckens (2010: 143); (iv) inanimate subjects in the complement clause, non-coreferential subjects between the two clauses, and simultaneous temporal relation, all show a higher proportion of finite patterns (cf. Quirk et al. 1985; Noonan 2007; Heyvaert & Cuyckens 2010); and (v) the passive voice, the presence of negative particles in the complement clause, an increase in the number of words in the complement clause and the intervening material, all increase the cognitive complexity of the structure and favor the use of finite patterns (cf. Rohdenburg 1996, 2006; cf. Section 3.2).
As for the comparison between the different varieties of English, the L2 varieties of English show a higher proportion of declarative finite that/zero-complement clauses as compared to the L1 varieties. This difference was tentatively explained in terms of (a combination of) three factors: i) it might be a historical trait inherited from British English at the time of colonization; ii) it might be caused by the relative unfamiliarity of the verb regret, since it is a low-frequency verb; iii) it might be the result of the influence of the cognitive effects derived from the language contact and SLA situation in which the L2 varieties of English have emerged (such as isomorphism and transparency); and iv) it might be explained in terms of transfer from the substrate languages (cf. Section 3.3).
In sum, the analysis of the complementation profile of regret confirms that the English complementation system is in fact an area of variability in WEs, as mentioned in previous studies (cf. Section 1), which deserves more scholarly attention. Ideally, as I attempted to do here, studies should try to combine statistical analyses of measurable syntactic and semantic factors with descriptive accounts of potential historical and cognitive factors, as a means of achieving a more holistic approach to the development of World Englishes.
I am very grateful to Lukas Sönning for his great help with the statistical analysis, to Elena Seoane for her comments on an earlier draft of this paper, and to two anonymous reviewers for their valuable comments. For funding, my gratitude goes to the Spanish Ministry of Economy and Competitiveness (grant FFI2017-82162-P) and the University of Vigo.
 Potential problems with the GloWbE corpus have been reported in previous works (see for example Davies & Fuchs 2015; Mukherjee 2015; Hoffmann 2018). Some of the backdrops highlighted are the incorporation of texts from the comments sections of newspapers, with the subsequent impact on the impossibility to ascertain the country of origin of the writers despite the careful selection of webpages through the domains (.lk for Sri Lanka and .sg for Singapore, for example; Davies & Fuchs 2015: 26; Hoffmann 2018: 179) and the difficulty to know the particular variety used by the writer, that is, acrolectal, mesolectal, or basilectal (Mukherjee 2015: 35). However, previous research using the ICE corpus and GloWbE obtain similar results (see for example Heller & Röthlisberger 2015). Use of GloWbE here is necessary due to its size. An exploratory search in the British component of the International Corpus of English (ICE, Greenbaum 1996) retrieved only 14 examples. [Go back up]
 False positives are examples in which regret is used as a noun or adjective, and examples that are not internet sources. The presence of such examples led me to manually analyze precision and recall. The rates for the four varieties were above 90% in both precision and recall with the exception of British English, in which recall which was 89.2% (cf. Table 6 in the Appendix). On the other hand, invalid examples are repeated, incomplete, and unintelligible examples. [Go back up]
 As the reviewers of this article pointed out, that- and zero-complement clauses differ in terms of their cognitive complexity and could be treated as two separate variants. However, the low frequency of use of zero-complement clauses in the corpus (only 87 examples were retrieved for all the varieties under study) made it necessary to merge these two variants of finite clauses under one connecting label, that is, that/zero-complement clause. Moreover, other scholars studying the competition between finite and non-finite patterns also group these variants together (e.g. Rohdenburg 1996, 1999, 2015; Cuyckens et al. 2014; Cuyckens & D’hoedt 2015). [Go back up]
 According to the OED, the first attestation of the use of regret as a verb in English is in the poem “Pearl” in the late 14th century; Art þou my perle þat I haf playned, Regretted by myn one, on nyȝte? [Go back up]
 The present-day data for the verb remember (GloWbE 2012–2013) is a modified version of the data in García Castro (2018), a study in which a random sample of 3,000 instances is taken as a means of distinguishing between declarative finite that/zero-complement clauses and non-finite complement clauses (to-infinitives and gerunds). Hence, the to-infinitives had to be excluded for the present study. [Go back up]
 The full R command is:
glm(that ~ variety_factor + general + action + inanimate + non_coreferential + simultaneous + passive + negative, log_words_constituents + log_intervening_material, data=REGRET, family=binomial('logit')) [Go back up]
 Predicted percentages are calculated by defining an average condition and then holding each predictor at a low and a high value. The differences between these two values are the predicted percentages. [Go back up]
 I am aware that some speakers of NigE and HKE may be L1 speakers of these varieties, since they are in phases 4 and 3 of Schneider’s (2007) Dynamic Model and some members of the youngest generations may have learnt English as an L1. Since these cases are not the general rule, I continue to use the generalization L2 speakers for these. [Go back up]
GloWbE = Corpus of Global Web-based English. 2013. Compiled by Mark Davies. https://corpus.byu.edu/glowbe/
ICE = International Corpus of English. http://ice-corpora.net/ice/index.html
OED = Oxford English Dictionary (online edition). http://www.oed.com/public/online/about-oed-online#ViewingSecondEdition
Script used for transforming Killingley’s transcription: https://github.com/kfcd/pingyam/blob/master/pingyambiu
Adesola, O. 2015. Yoruba: A Grammar Sketch: Version 1.0. Afranaph Project. Rutgers, The State University of New Jersey. https://www.africananaphora.rutgers.edu/yoruba-casemenu-148
Bernaisch, Tobias. 2013. “The verb-complementational profile of offer in Sri Lankan English”. Corpus Linguistics and Variation in English: Focus on Non-Native Englishes (Studies in Variation, Contacts and Change in English 13), ed. by Magnus Huber & Joybrato Mukherjee. Helsinki: VARIENG. http://www.helsinki.fi/varieng/series/volumes/13/bernaisch/
Cuyckens, Hubert & Frauke D’hoedt. 2015. “Variability in clausal verb complementation: The case of admit”. Perspectives on Complementation: Structure, Variation and Boundaries, ed. by Mikko Höglund, Paul Rickman, Juhani Rudanko & Jukka Havu, 77–100. Houndsmills, Hampshire: Palgrave Macmillan.
Cuyckens, Hubert, Frauke D’hoedt & Benedikt Szmrecsanyi. 2014. “Variability in verb complementation in Late Modern English: Finite vs. non-finite patterns”. Late Modern English Syntax, ed. by Marianne Hundt, 182–204. Cambridge: Cambridge University Press.
De Smet, Hendrik. 2014. “Constrained confusion: The gerund/participle distinction in Late Modern English”. Late Modern English Syntax, ed. by Marianne Hundt, 224–238. Cambridge: Cambridge University Press.
De Smet, Hendrik & Hubert Cuyckens. 2007. “Diachronic aspects of complementation: Constructions, entrenchment, and the matching problem”. Studies in the History of the English Language III: Managing Chaos: Strategies for Identifying Change in English, ed. by Christopher M. Cain & Geoffrey Russom, 187–214. Berlin: Mouton de Gruyter.
Deshors, Sandra C. & Stefan Th. Gries. 2016. “Profiling verb complementation constructions across New Englishes: A two-step random forest analysis of ing vs. to complements”. International Journal of Corpus Linguistics 21(2): 192–218.
Fanego, Teresa. 2007. “Drift and development of sentential complements in British and American English from 1700 to the present day”. ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English, ed. by Javier Pérez-Guerra, Dolores González-Álvarez, Jorge Luis Bueno-Alonso & Esperanza Rama-Martínez, 161–235. Bern: Peter Lang.
Fanego, Teresa. 2010. “Variation in sentential complements in eighteenth- and nineteenth-century English: A processing-based explanation”. Eighteenth-century English, ed. by Raymond Hickey, 200–220. Cambridge: Cambridge University Press.
Fischer, Olga. 1988. “The Rise of the for NP to V construction: An explanation”. A Historic Tongue: Studies in English Linguistics in Memory of Barbara Strang, ed. by Graham Nixon & John Honey, 67–88. London: Routledge.
Grieve, Jack, Douglas Biber, Eric Friginal & Tatiana Nekrasova. 2010. “Variation among blog text types: A multi-dimensional analysis”. Genres on the Web: Corpus Studies and Computational Models, ed. by Alexander Mehler, Serge Sharoff & Marina Santini, 303–322. New York: Springer-Verlag.
Heller, Benedikt & Melanie Röthlisberger. 2015. “Big data on trial. Researching syntactic alternations in GloWbE and ICE”. Paper presented at From Data to Evidence. Big Data, Rich Data, Uncharted Data, University of Helsinki, 19–22 October 2015.
Heyvaert, Liesbet & Hubert Cuyckens. 2010. “Finite and gerundive complementation in Modern and Present-day English: Semantics, variation and change”. Historical Cognitive Linguistics, ed. by Margaret E. Winters, Heli Tissari & Kathryn Allan, 132–160. Berlin: De Gruyter Mouton.
Hoffmann, Sebastian. 2018. “I would like to request for your attention: On the diachrony of prepositional verbs in Singapore English”. Changing Structures: Studies in Constructions and Complementation, ed. by Mark Kaunisto, Mikko Höglund & Paul Rickman, 171–196. Amsterdam & Philadelphia: John Benjamins.
Hundt, Marianne. 2009. “Colonial lag, colonial innovation or simply language change?” One Language, Two Grammars? Differences Between British and American English, ed. by Günter Rohdenburg & Julia Schlüter, 13–37. Cambridge: Cambridge University Press.
Mair, Christian. 2006. “Nonfinite complement clauses in the nineteenth century: The case of remember”. Nineteenth-century English: Stability and Change, ed. by Merja Kytö, Mats Rydén & Erik Smitterberg, 215–228. Cambridge: Cambridge University Press.
Mukherjee, Joybrato. 2015. “Response to Mark Davies and Robert Fuchs: Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE)”. English World-Wide 36(1): 34–37.
Mukherjee, Joybrato & Stephan Th. Gries. 2009. “Collostructional nativisation in New Englishes. Verb-construction associations in the International Corpus of English.” English World-Wide 30(1): 27–51.
Mukherjee, Joybrato & Marco Schilk. 2008. “Verb-complementation profiles across varieties of English”. The Dynamics of Linguistic Variation: Corpus Evidence on English Past and Present, ed. by Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta & Minna Korhonen, 163–181. Amsterdam: John Benjamins.
Rohdenburg, Günter. 1999. “Clausal complementation and cognitive complexity in English”. Anglistentag Erfurt 1998, ed. by Fritz-Wilhelm Neumann & Sabine Schülting, 101–112. Trier: Wissenschaftlicher Verlag.
Rohdenburg, Günter. 2006. “The role of functional constraints in the evolution of the English complementation system”. Syntax, Style and Grammatical Norms: English from 1500–2000, ed. by Christiane Dalton-Puffer, Nikolaus Ritt, Herbert Schendl & Dieter Kastovsky, 143–166. Frankfurt: Peter Lang.
Rohdenburg, Günter. 2015. “The embedded negation constraint and the choice between more or less explicit clausal structures in English”. Perspectives on Complementation: Structure, Variation and Boundaries, ed. by Mikko Höglund, Paul Rickman, Juhani Rudanko & Jukka Havu, 101–127. Houndmills: Palgrave Macmillan.
Rudanko, Juhani. 1999. Diachronic Studies of English Complementation Patterns: Eighteenth Century Evidence in Tracing the Development of Verbs and Adjectives Selecting Prepositions and Complement Clauses. Lanham, MD: University Press of America.
Schilk, Marco, Tobias Bernaisch & Joybrato Mukherjee. 2012. “Mapping unity and diversity in South Asian English lexicogrammar”. Mapping Unity and Diversity World-Wide: Corpus-based Studies of New Englishes, ed. by Marianne Hundt & Ulrike Gut, 137–166. Amsterdam: John Benjamins.
Schilk, Marco, Joybrato Mukherjee, Christopher F.H. Nam & Sach Mukherjee. 2013. “Complementation of ditransitive verbs in South Asian Englishes: A multifactorial analysis”. Corpus Linguistics and Linguistic Theory 9(2): 187–225.
Schneider, Edgar W. 2012. “Contact-induced change in English worldwide”. The Oxford Handbook of the History of English, ed. by Terttu Nevalainen & Elizabeth C. Traugott, 572–581. Oxford: Oxford University Press.
Schneider, Edgar W. 2013. “English as a contact language: The ‘New Englishes’”. English as a Contact Language (Studies in English Language), ed. by Daniel Schreier & Marianne Hundt, 131–148. Cambridge: Cambridge University Press.
Sheehan, Michelle & Jenneke van der Wal. 2016. “Do we need abstract case?” Proceedings of the 33rd West Coast Conference on Formal Linguistics, ed. by Kyeong-min Kim, Pocholo Umbal, Trevor Block, Queenie Chan, Tanie Cheng, Kelli Finney, Mara Katz, Sophie Nickel-Thompson & Lisa Shorten, 351–360. Somerville, MA: Cascadilla Proceedings Project.
Slobin, Dan. 1980. “The repeated path between transparency and opacity in language”. Signed and Spoken Language: Biological Constraints on Linguistic Form, ed. by Ursula Bellugi & M. Studdert-Kennedy, 229–243. Weinheim: Verlag Chemie.
Steger, Maria & Edgar Schneider. 2012. “Complexity as a function of iconicity: The case of complement clause constructions in New Englishes”. Linguistic Complexity: Second Language Acquisition, Indigenization, Contact, ed. by Bernd Kortmann & Benedikt Szmrecsanyi, 156–191. Berlin: De Gruyter Mouton.
Vosberg, Uwe. 2003. “Cognitive complexity and the establishment of -ing constructions with retrospective verbs in Modern English”. Insights into Late Modern English, ed. by Marina Dossena & Charles Jones, 197–220. Berlin: Peter Lang.
|Verbs retrieved of total||% recall||regret*_v* vs. false positives||% precision|
|American English||111 of 117||94.9||5466/304||94.7|
|British English||99 of 111||89.2||5153/298||95.4|
|Hong Kong English||90 of 93||96.8||485/40||92.4|
|Nigerian English||105 of 111||94.6||1082/44||96.1|
|2||Paratactic, no connector|
|2; 5||Coordinate connectives: and, and then|
|2; 10||Subordinates headed by: if, when, while, so, because, before|
|3||Subordinates headed by: how, where, that|
|4||Non-finite complements (subjectless and non-catenative types)|
|5-6||Higher consistent frequency of complex clause (e.g. infinitival complements)|