Congratulations, You WON!!! Exploring trends in Big Data marketing communication

Joe McVeigh
Department of Language and Communications Studies, University of Jyväskylä
VARIENG, University of Helsinki


Big Data collection and analysis is exploding throughout every field of research. Now that many forms of communication have gone electronic, the possibility to research them is greater than it has ever been. With this data there is also the possibility to correlate linguistic data with metadata to uncover interesting patterns. Big Data research in linguistics is therefore not constrained to just word counts, but depends on the discipline and goals of the study. Linguistic analyses of corpora can have benefits outside the field of linguistics, such as in marketing, where there is a substantial economic value placed on language. Linguistic descriptions of marketing data therefore have a commercial appeal since they can be applied directly to the creation of future texts. This paper researches a corpus of 2,021 email marketing subject lines (9,881 tokens), which were together sent over 84 million times. The subject lines are coupled in the analysis with each email’s average open rates, which is a standard success metric widely used by email marketers in evaluating subject lines. The analysis shows how email marketing subject lines are similar to other types of CMC and other types of marketing. The subject lines exhibit interesting linguistic features, especially in the use of non-standard variations and exclamation points. These variations, as well as the parts of speech of the subject lines, are investigated to show which of them correlate to the success of each subject line. This research also addresses the gap in CMC research on email marketing and email subject lines, both of which have been almost entirely ignored in linguistic research.

1. Introduction

The success and proliferation of email marketing, or advertising via email messages, offers linguists a new area of computer-mediated communication to study. Computer-mediated communication (CMC) refers to “predominantly text-based human-human interaction mediated by networked computers or mobile telephony” (Herring 2007) and it includes all types of email. Email marketing is a new genre for CMC research because even though it has been a popular form of marketing for over a decade, it still remains relatively understudied by linguists in comparison to personal and workplace email communication. Email marketing, however, is used by over 90% of businesses in the UK (DMA 2014) and almost 70% of US internet users prefer email as the method to communicate with businesses (Hanington 2015). As a text type that has a financial value placed on it by its creators, email marketing offers finer grained metrics to be used in an analysis than ever before (Jenkins 2009). These metrics show statistics such as how many people received a certain email marketing text, how many people opened it and how many followed the email through to the company’s or product’s website.

Email marketers spend a lot of time crafting their messages to particular audiences and they use every available linguistic resource to stand out from the crowd in potential customers’ inboxes. In writing subject lines, marketers do not have the ability to use italics, bold face, images, video, or other multimodal resources – unformatted written text is all they have to accomplish their goal of getting customers to open the email. This has led to the marketers using quite inventive forms of non-standard variations, which can be seen in this study. It has also led to marketers researching which linguistic features will be more successful. Rather than starting on the part-of-speech level though, the research usually isolates specific linguistic features that are in successful subject lines. Assumptions are then made based on which part of speech these words belong to. These assumptions include claims that success comes from using “verb[s] that demand action” in a subject line (MacArthur 2016), or from using “superlative elements” such as adjectives, adverbs and exclamation points (Phrasee 2015), or from adding customers’ names to subject lines (Neel 2013).

Although marketing emails have received some attention from scholars (for example, Cheung 2008), and although there is no shortage of marketing guides which claim what successful subject lines should include, there is still no account of what subject lines actually look like linguistically. They have been almost entirely ignored by researchers.

This article uses a corpus of 2,021 email marketing subject lines which were sent over 84 million times to analyze certain basic parts of speech and common linguistic features of email marketing. [1] Each subject line is tested for whether it includes a noun, verb, adjective, adverb, pronoun, exclamation point, a non-standard variation, the name of a product, a customer’s name, or a numeral. The variables of parts of speech and linguistic features are paired with open rates for each subject line to see if any of them correlate with success. Some of the variables are then combined to offer a more detailed picture of the linguistic nature of email subject lines. The results of this study show that some of the claims made by marketing guide books about how to write subject lines can be supported. Furthermore, the results add to the field of CMC studies by providing data on a text type that has so far not received enough attention. The results also show the similarities and differences between email marketing and other forms of CMC. Subject lines in email marketing, in contrast to those in other forms of email communication, are considered to be just as important as email body texts.

The study presented in this paper starts from the linguistic feature level to see whether any of the features actually correlate with success. The research question is whether the methods of corpus linguistics can be applied to email marketing to show which linguistic features are common in successful email marketing. If some linguistic features can be shown to correspond with a positive success metric, such as average open rate, then it may be possible to produce a template which incorporates the linguistic data and metadata to allow marketers to produce texts that have a better chance at success. Templates are desirable in email marketing because marketers know that recipients scan their emails. So a familiar format, layout and/or language is welcome to both marketer (by making the creation of the texts faster and easier) and recipient (by making the reading/scanning faster and easier). The emails that professional email marketers send (and the ones analyzed here) are not spam, but messages that recipients opted in to receive. A template could even be expected by recipients, i.e. they want all of the emails from one company to look and read the same way every time.

The results show, however, that many of the linguistic features discussed in email marketing literature do not correlate with success. But finding no correlation between linguistic features and success is just as interesting as finding a correlation since it raises questions about the actual effect of language in email marketing and how much time and financial resources marketers should spend on considering linguistic features when writing their texts.

2. Background

2.1 Linguistic research on marketing and email

Linguistic research into marketing goes back half a century to Leech (1966) and the genre has been analyzed from many different linguistic perspectives. Research has shown that marketing is a dynamic and innovative genre that borrows lexico-grammatical and discourse features from many other genres (Bhatia 2004). Cook (1994: 34) suggests that marketing texts “borrow so many features of other discourses that they are in danger of having no separable identity of their own”. But marketing is said to use every resource of language to invite creative and subtle readings from its users (Goddard 1998: 3) and linguists have claimed that the attempt to give pleasure is a prototypical feature of marketing texts (for example, Cook 1994). Scholars have found that the language of marketing is repetitive in both its linguistic structure and word use in order to enhance textual cohesion (Myers 1994; Cook 1994). More recent studies have shown that certain lexical bundles in email marketing are hyper-repetitive across texts, most likely because the marketers are working from a template (McVeigh forthcoming). Linguistic research has also shown that marketing attempts to appear personal in nature to potential customers (Myers 1994; Goddard 1998). In a sense, marketing tries to target certain types of people who would be likely to buy the product being sold. Lowrey (2007) has found that the linguistic complexity of marketing increases when marketing texts are more targeted at people who are motivated to buy the product being offered. Thus the readability level or length of the marketing texts will increase when marketers are surer that the customers being targeted are already interested in the product because these customers will devote more time to understanding the text.

The genre of email has also received a great amount of attention from linguists, especially in the last twenty-five years, although the technology is older than that. Studies on email communication have investigated the occurrence of “flaming” or conflictual messages, including the use of exclamation points and all caps (Turnage 2008); the linguistic nature of greetings and closings (Waldvogel 2007); the effect of politeness and grammar in email messages (Jessmer & Anderson 2001); deception in CMC and email (Hancock & Gonzales 2013; Keila & Skillicorn 2005); email hoaxes (Heyd 2013); the use of emoticons in workplace email (Skovholt et al. 2014); the use of exclamation points (Waseleski 2006); and the role of small talk in workplace email (Hössjer 2013). This is by no means an exhaustive list of the research done on email communication, but it serves to show how broad the research on email is and how many different angles it has been approached from.

Baron (1998, 2000, 2003), who was one of the first researchers to analyze email, focused on the way that email mimics or adopts features that are normally associated with speech and suggested that the line between written and spoken language gets blurred in email communication. Dürscheid & Frehner (2013: 37), who claim that email is the most important form of CMC, challenge Baron’s claims by pointing out that in the genre of email there is “a wide array of texts types, ranging from informal to formal, as well as a great variety of situational factors” which will influence the linguistic style of each email text. Early research into CMC, including email, has also been done by Herring (2001, 2007), who applied methods of discourse analysis to develop a versatile framework that could be used to analyze current and emerging forms of CMC. The importance of Herring’s contributions to the linguistic research on email communication cannot be overstated and the computer-mediated discourse analysis framework that she developed provides a practical definition of email (in comparison to other forms of CMC). According to Herring’s (2007) framework, email is asynchronous and one-way because the communication partners do not need to both be logged in and they are not able to see how or whether the other person is typing a message.

While Dürscheid & Frehner (2013) acknowledge Androutsopoulos’ (2006) concern that it is difficult to identify linguistic features that are typical of email communication because the settings and purposes of email outweigh any common linguistic forms, they nevertheless outline several lexical and grammatical characteristics of email communication. The structural features that they mention include lexical abbreviation, syntactic reduction, and minimal use of capitalization. In outlining these features, Dürscheid & Frehner (2013) stress that none of these features are new or unique to email communication, but that they seem to be more common in emails than in other texts.

Despite email being the most important type of CMC, and despite the fact that linguistic research on email has been going on for almost thirty years, it is rather peculiar that there are so few studies which focus on email marketing. Even studies which characterize the register of email, such as Frehner (2008), fail to mention email marketing as a subgenre of email communication. Dürscheid & Frehner (2013) mention that email is just as important in business communication as it has always been, even if it has lost some of its hold on CMC to other services, such as instant messaging and social media, but they fail to mention the dominance of email in marketing. The literature on CMC would therefore benefit from a linguistic analysis of email marketing, which is one of the main ways that email is being used today.

Another striking gap in the research on email is the lack of studies which focus on or even mention email subject lines. Subject lines are so essential in email communication that most email clients (Outlook, Gmail, Yahoo!, etc.) will prompt users to write something in the subject line field if they try to send an email without a subject line. Yet subject lines are barely mentioned in studies on email communication, even when the subject lines are included in the examples (Skovholt & Svennevig 2006; Graham 2007; Waldvogel 2007; Severinson Eklundh 2010; Hössjer 2013). The absence of commentary on email subject lines can perhaps be explained by the fact that the emails which have been studied are of the type that would be opened no matter what the subject line read. That is to say that when someone receives an email from a friend or family member, or when an employee receives an email from their employer or co-worker, it does not really matter what the subject line reads. But even in personal and business emails, subject lines serve an important purpose by informing receivers of the reference object of the email body text (Dürscheid & Frehner 2013). Graham (2007) showed that subject lines were so important to one e-community that they expressly encouraged the accurate marking of the content of the email bodies in the subject line and that failure to do so was seen as impolite. So a study on the linguistic features of email subject lines would also benefit the research literature, especially since subject lines are even more important in email marketing than they are in other types of email communication.

2.2 Marketing research on email

Email marketing is broadly defined as any email that is sent from a company to a potential customer (Marketing Terms n.d.). This broad definition covers both emails that are more like newsletters, which update the customer on current events, and emails which are written to sell a product. Email marketing differs from personal and professional communication made via email in that it places a tangible price on communication. Marketing guides warn about the dangers of boring one’s potential customers (Lewis 2002). Jenkins (2009: x) says that email users “are growing increasingly selective and short on patience when choosing which emails to read” and that these users can “punish” email marketers by either ignoring their emails and/or marking their emails as spam, which has the potential to block access to other potential customers. Compared to business and personal communication then, it is possible to question the extent to which we all ignore certain messages, but certainly none of us mark emails from our colleagues or friends as spam. For email marketers, this is a very real possibility.

This raises the importance of subject lines in email marketing since they actually have to entice readers to open the emails (Chaffey 2011). Marketing emails which are not opened are considered failures, so marketers place a strong emphasis on subject lines. Jenkins (2009: 104) claims that subject lines are “arguably the most influential piece of copy when it comes to getting subscribers to open and read emails”, while Bly (2002: 192) says that subject lines are “probably the second most important line of the message” (behind the From line). The linguistic research which showed that marketers strive to create personalized messages (Myers 1994; Goddard 1998) is especially relevant to subject lines. Email software allows marketers to automatically append the names of recipients to subject lines, so that each person who receives the email will see their name in the subject line. The email medium can therefore most directly target specific types of people compared to other types of marketing based on a variety of demographic factors, such as gender, age, and profession (Lewis 2002; Jenkins 2009; McVeigh forthcoming).

Marketers are obviously concerned with monitoring the success of their campaigns and subject lines are a very important way of doing this. While traditional forms of marketing, such as newspaper, radio and television ads, are potentially able to reach millions of customers, there is no truly reliable way to know exactly how many people saw, heard or read the marketing text. Today, however, technology allows marketers to craft specific messages for specific audiences, as well as to test which campaigns did better with certain audiences.

In judging the success of marketing emails, their most straightforward success metric is the open rate (OR), which is calculated as the number of emails opened divided by the number of emails delivered (Alchemy Worx 2008). OR in this case does not mean that the email client automatically opened an email in order to show some of the body text in a preview pane. Instead OR refers to those emails which were actively opened, i.e. clicked on by recipients. OR is the most important metric of success when judging subject lines because it most directly shows whether a subject line achieved its goal, i.e. whether it enticed a recipient to open the email. It is especially applicable to marketing emails which are designed to be read rather than to sell a product. When emails do not attempt to sell a product, their purpose is to maintain a familiar relationship with customers, or in other words to maintain a positive image of the company in the customer’s mind. Examples of these types of email are those sent by Oxford Dictionaries or the Smithsonian to their subscribers. These emails primarily include blurbs and links to articles on their websites, but the emails may also have links to pages where the email recipient can purchase a product (such as a dictionary or a jewelry from the Smithsonian store).

This lack of research on email marketing is especially glaring when we consider how important email marketing is and has been to businesses. According to one report, the amount of consumer email continues to grow because of “its use for notifications (e.g. for online sales) rather than simply as an interpersonal communication tool” (Radicati Group 2015). Email marketing is also seen as more effective than any other type of digital marketing, including social media, websites and search engine optimization (Ascend2 2014). Moreover, US businesses are expected to spend $2.5 billion on email marketing in 2016 (DMA 2014). Yet in study after study email marketing gets only a passing mention, if it is discussed at all.

3. Material and methods

This paper analyzes a corpus of 2,021 email subject lines. Altogether, the subject lines were sent 84,386,667 times and opened by potential customers 9,183,118 times. The subject lines come from Alchemy Worx, an email marketing company which, among other services, manages and deploys email marketing texts for other companies. The high send rates of email marketing texts mean that many companies do not have the IT resources to send the emails themselves (Jenkins 2009), so hiring another company to send them is a common practice. Due to privacy concerns, the specific company which created the email marketing texts cannot be named, but all of the subject lines come from the same company. The emails in this corpus are not spam (unsolicited email) and they were not sent without prior consent from the recipients. Consent in email marketing works in a number of ways, but it often happens when customers opt-in to receive email messages from a company while they are in the process of purchasing a product. People opt-in to receive these emails because they want to know about future sales the company has on offer. Rather than regularly browsing to each company’s specific website, customers can just check their inbox. This allows the company to grow and curate a list of consenting customer email addresses, which can then be used to market future products to. While the customer opts-in to receive emails, the company can match their email to other information that the customer gives, such as their name, address, and what type of product they purchased. This is how companies are able to target their email marketing to customers based on demographic data. For example, if a company is selling merchandise for the Philadelphia Phillies baseball team, they would naturally send marketing emails to the customers in their email list which are located in the Philadelphia area. Both marketers and customers are aware of this relationship: both of them know that there must be something on offer in future emails, i.e. customers will not receive emails about products which are not new or not on sale, and the offers will be targeted at customers in some respect, i.e. marketers will decide which customers to email based on the information that customers have given them.

The subject lines were sent during or after 2012. Their lengths range from 1 to 75 characters, although only 47 of the subject lines have seven characters or less. The subject lines are not all unique. Since the same subject line was in some cases sent in multiple campaigns, there are subject lines in the corpus which are linguistically the same, but their metadata is different. The metadata for the subject lines can be seen in Table 1. Only 27 subject lines have OR rates over 20% and only 2 have OR rates over 50%.

Subject lines 2,021
Emails sent 84,386,667
Emails opened 9,183,118
Lengths in characters 1–75
Average length 24
Range of open rates 5–61%
Average open rate for corpus 12.38%
Industry average open rate 13.15%

Table 1. Metadata on the subject lines in the corpus.

The subject lines were part-of-speech tagged using the CLAWS7 tagset via the Wmatrix processing environment (Rayson 2009). Wmatrix automatically annotated the words in the subject lines for their parts of speech using the CLAWS software. For example, the subject line Congratulations , You WON! would be annotated Congratulations_NN2 ,_, You_PPY WON_VVD !_! where the tag _NN2 means that the word it is attached to is a plural common noun, the tag _PPY means the word is a 2nd person personal pronoun, etc. This automatic tagging is the most common form of corpus annotation and it facilitates research by allowing the subject lines to be analyzed based on their parts-of-speech alone. The CLAWS7 tagset has over 160 part-of-speech tags and an accuracy rate of 96–98% (Rayson 2003). [2]

Each subject line was treated separately and tested for whether or not it included one of the parts of speech and some other features which have been discussed in the literature on CMC and email marketing. The presence of these features were treated as binary variables, i.e. I did not count the total number of each feature, but rather recorded whether each subject line included a feature or not. As there have not been any studies on the linguistic features of email subject lines, this study analyzes the general parts of speech. Thus, subject lines were not analyzed for whether they included a plural noun, but whether they included any noun at all. This is also the case for the other parts-of-speech variables in this study. The categories are noun, verb, pronoun, adjective, and adverb. [3]

Another category which was examined is exclamation point since various marketing guides report somewhat contradictory information on punctuation. Whereas some see the use of exclamation points as a positive that will help an email’s open rate (MailChimp n.d.), others see exclamation points as being overused (Lewis 2002) and as a spam trigger which may send the email directly into a receiver’s junk folder (MailChimp 2016), therefore hurting the open rate. Another marketing guide (Bly & Kelly 2009: 20, 26) claims that using exclamation points suggests “overexcitement” and that these punctuation marks “impart a sense of desperation, instability, or bossiness”. In discussing CMC, Turnage (2008: 53) claims that exclamation points can “cause a message to be considered a flame”, which is to say that using one may indicate hostility and/or aggression. On the other hand, Waseleski (2006: 1013) has shown that exclamation points in CMC “function most often to indicate friendliness and to emphasize intended statements of fact”.

Each subject line was also categorized for whether it included a non-standard variation. Frehner (2008: 43) claims in a linguistic analysis of personal emails that “mistakes in spelling, punctuation and syntax are largely tolerated”, but this has perhaps been drawn into question by more recent research (see Boland & Queen 2016, Queen & Boland 2015). Bly & Kelly (2009) claim that such mistakes will have a negative impact on marketing emails and they repeat the often heard notion that the use of all caps is perceived as YELLING in CMC. Nevertheless, Frehner (2008: 47) says that paying less attention to spelling errors and typos is “very typical of the language of email and computer-mediated communication in general” and that email writers do not care about “carefully composed and beautifully adorned sentences” because of the speed and spontaneity which is inherent in composing an email. Frehner (2008) cites Baron’s (2000) claim that email users prefer not editing their texts at all before sending them. The features which are included in the non-standard variation category are the use of all caps or no caps, variant spacing and spelling, variant punctuation, variant symbol usage and/or formation, variant capitalization, multiple exclamation points, and the use of hashtags. Although some of the subject lines in the corpus use a combination of these variations, due to the space limitations of this article they were categorized only based on whether or not they included a variation. Puns and plays on words, which are evident in subject line like Oh Ship!, were not considered non-standard variations. Subject lines which were categorized as including a customer’s name (see below) were considered to have a capital letter even if the rest of the letters where not capitalized. Subject lines which have no letters at all were not considered as including a non-standard variation. Subject lines where a number is used as a letter, such as 5ave 55% and 2o% 0ff, were however considered to have a non-standard variation, even though the numeral 5 looks like a capital S when placed next to a lowercase a. Examples of the subject lines which include non-standard variations can be seen in Table 2. [4]

Non-standard variations Examples in subject lines
This Order Is FREE ?
no caps 95% (it’s almost free)
90 % * off * clearance *
Variant spacing C L E A R A N C E - S A L E !
C L E A R A N C E - 90% OFF!
Variant spelling $25 COOPON (Our Biggest EVER)
Variant punctuation *B*O*O*! *
Our Biggest Memorial Day Sale E-V-E-R
Variant symbol usage (* *)/ WOOHOO! 40% off DENIM! (* *)/
50% Off >> A L L << Shoes!
5()% ()ff
L <()> <()> K
Variant capitalization bbbbbBBBLACK is bbbbbbBBBACK
*bLaCk FRiDaY*
Multiple exclamation points Beautiful!!!
Hashtags #Rare

Table 2. Non-standard variations in the email subject lines.

The next category in the analysis is customer named. Subject lines in the corpus occasionally include $FIRST_NAME$ at the end of them, which indicates that receivers’ first names were automatically appended to them. This is a practice marketers use to make their emails more personable (Lewis 2002; Neel 2013), but it also has a practical use, as Lewis (2002: 130) claims that “in over 90 percent of tests, including the recipient’s name in the subject line increases response”. Jenkins (2009: 46) also says that adding the names of recipients is something that email marketers “should be doing” in order to make the message more personal. In some subject lines, there is a space between the final word and the punctuation mark (if there is a punctuation mark). In these cases, the closing punctuation mark is usually an exclamation point (in four subject lines it is a question mark). There are also subject lines which begin with a comma, as well as ones where a genitive (’s) is not attached to any noun. Examples of subject lines with these features are in Table 3:

Feature Examples in subject lines
$FIRST_NAME$ in subject line Final Notice for $FIRST_NAME$!
Half price? Only because we love you, $FIRST_NAME$.
Space before closing punctuation mark A FREE Gift for !
A $10 iTunes Gift Card for !
Subject line begins with a comma , You Qualified for A $10 iTunes Gift Card!
, Do Your Kids Wear Uniforms?
, Would You Like to Save 15% OFF Everything?
Unattached genitive ’s A Present for ’s Children!

Table 3. Examples of the customer named category.

In all of these cases, it is assumed that the name of the recipient was automatically appended to the subject line. It should be noted that the names of the email recipients were not removed from the subject lines by me, but rather they were never in the subject lines to begin with. Email software allows marketers to automatically append a recipient’s name by adding a code to the subject line which will find the name from a database, so no actual names are typed into the subject lines. Additionally, the company which I received the subject lines from removed names from the emails to protect personal identities. The subject lines which included these features were categorized as customer named. They were cross referenced with the noun category to make sure that each subject line that was recognized as including a customer name was also recognized as including a noun. The CLAWS tags could not be relied on for this variable since there are words in some subject lines which include other examples of the genitive marker, such as Levi’s, St. Patrick’s, and Today’s. So the customer named category was checked to make sure that it did not count these uses of the genitive marker. The other two categories in the analysis are straightforward. The first category is product named and it is for whether the subject line named a specific product or kind of product, as in You Qualified for A $10 iTunes Gift Card! and 50% Off Dresses. The second and final category is numeral and it is simply for whether a subject line included any numeral or combination of numerals from 0 to 9. The numeral category does not include subject lines which have numbers written out.

4. Analysis

4.1 Length

Figure 1. Scatter plot of emails sent to average open rate.

Figure 1. Scatter plot of emails sent to average open rate.

The majority of the subject lines in the corpus (1,974 out of 2,021) are under 50 characters in length. As can be seen from Figure 1 it seems that even when the subject lines get longer, the average open rate does not change much. There are a few outliers in the figure. For example, the 26-character subject line Congratulations , You WON! seems to be particularly successful as six of the seven instances of this subject line have an open rate over 40% (marked in red in the scatter plot). This success can be explained by the relatively low number of emails sent for each of these highly successful instances. For example, the exact same subject line has a more normal open rate (albeit still higher than average) when it was sent hundreds of thousands of times rather than just a few hundred times (see Table 4).

Text ID Subject line Tagged subject line Open rate Emails sent
210_255 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 42.97% 249
210_810 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 42.97% 249
210_1183 Congratulations , You Won! Congratulations_NN2 ,_, You_PPY Won_VVD !_! 19.22% 109,972
210_1367 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 52.43% 492
210_1510 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 48.59% 249
210_1634 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 43.70% 389
210_1645 Congratulations , You WON! Congratulations_NN2 ,_, You_PPY WON_VVD !_! 61.21% 165

Table 4. Figures for the outlying 26-character subject line “Congratulations , You WON!”.

4.2 Parts of speech

In comparing the inclusion of parts of speech to the average open rates of the subject lines, it does not seem to matter whether the subject lines include nouns, verbs, pronouns, adjectives or adverbs. Nouns are, perhaps unsurprisingly, by far the most common part of speech in the subject lines, where only 92 of the 2,021 total do not include nouns. Some of subject lines, such as 5+5+5 = WOW!, were tagged as including a formula (tag = _FO), where each numeral 5 could conceivably be considered a noun. So while the number of subject lines that include nouns could be even higher, the percentage would still not equal 100% as some of the subject lines, such as Beautiful!!!, How Cute! and BOO!, simply do not include any nouns. In contrast to the high rate of nouns, the majority of subject lines (65.61%) do not include a verb.

Various marketing guides (for example, Chaffey 2011) also stress the importance of using pronouns, especially the pronoun you, which Kelly-Holmes (2016: 221) says is “the most prominent pronoun in contemporary marketing”. This is the case with my corpus as well, in which you makes up 40% of all the pronouns used in the subject lines, far more than the next most common pronoun, our, which makes up 13% of all pronouns. Consequently, the almost 1% higher average OR for subject lines which include a pronoun may suggest that this part of speech is important.  Although the difference is slim, with potentially millions of emails sent for one campaign, a 1% higher OR could mean tens of thousands of more potential customers viewing the marketing copy. Likewise, it seems that the use of adverbs could have an adverse correlation on open rates, as the average OR for subject lines which do not include an adverb is slightly over 1% higher than those that do. This result is not in line with Lewis’ (2002: 136) claim that adverbs have greater “power than adjectives”. [5] The figures for the parts-of-speech categories for the subject lines can be seen in Table 5.

Variable in subject line No Yes
Average open rate
Average open rate
Average open rate
Average open rate
Average open rate

Table 5. Figures for the number of subject lines which included the parts-of-speech variables in the subject lines.

4.3 Other categories

The other categories are similar to the part of speech categories in that including an exclamation point in a subject line seems to have a positive correlation with success. Subject lines which include a numeral also seem to be less successful than those which do not, in much the same way that adverbs were. Of all the variables analyzed, however, the customer named category has the largest divide among average ORs. Appending the customer’s name through automation has the largest positive effect on ORs, at least when each of these categories are considered in isolation. Including a non-standard variation or naming the product seems to have a very slight negative correlation with open rates. The figures for each category can be seen in Table 6.

Variable in subject line No Yes Examples
exclamation point
Average open rate
How Cute!
Flash Sale - Ends Midnight Tonight!
non-standard variation
Average open rate
mmmmm Delicious SALE!
90% OFF - 40 Year Anniversary Sale!
product named
Average open rate
25% OFF Hello Kitty - Today ONLY!
, You Qualified for A $10 iTunes Gift Card!
customer named
Average open rate
$FIRST_NAME$, We Picked These for YOU!
Final Notice for $FIRST_NAME$!
Average open rate
65% OFF - New Arrivals Included!

Table 6. Figures for the number of subject lines which included the non-parts of speech variables in the subject lines.

Frehner (2008: 61) says that “ordinary statements followed by an exclamation mark are rather common” in personal emails and the results here seem similar to that. Of the 1,486 subject lines which include exclamation points, 1,427 of them have only one exclamation point, while 51 have two exclamation points and 8 have three exclamation points. None of the subject lines with two exclamation points have them in a row, but rather a single exclamation point after a statement, as in 80% OFF NOW! Spring and Holiday Clearance Sale! and BOO! Scary Savings Event!. There are only two different subject lines with three exclamation points – Beautiful!!! and C L E A R I N G - O U T - S T O C K ! ! !. In each case, these subject lines have ORs of over 13% and six of them have ORs over 17%.

In the non-standard variation category, only 29 of the 963 subject lines with a non-standard variation were written in no caps. This means that only 3% of the non-standard subject lines and 1.4% of the total number of subject lines in the corpus were written in all lower case. This is a much lower frequency than Frehner’s (2008) corpus of personal email communication, in which 15% of the emails were written in all lower case. In addition, almost half of the subject lines in the corpus are written in standard upper and lower case. This suggests that using minimal or no capitalization is not a feature of email marketing, which is contrary to the claims about email communication (Thurlow 2001; Frehner 2008; Dürscheid & Frehner 2013).

4.4 Combining categories: Exclamation point

The exclamation point and customer named categories seem to be especially important in the success of subject lines. These two features, however, are never used on their own; there are no subject lines in the corpus which are just an exclamation point or a customer’s name. It therefore remains to be seen whether subject lines are more successful when these categories are combined with the other categories and with each other.

Interestingly, the use of exclamation points in combination with parts of speech does not have a positive correlation with success in average open rates. For almost every part of speech categorized in this study, the average open rate was highest when the part of speech did not appear in a subject line, but an exclamation point did. So the average open rates are higher for subject lines which have an exclamation point, but do not have either a noun, a verb, an adjective, or an adverb, than they are for those which have an exclamation point and at least one of these parts of speech. The same holds for the comparison of non-standard variations and exclamation points. The only exceptions to this are subject lines which have a pronoun and an exclamation point, where the average open rate is 13.73%, which is higher than it is for the other combinations of pronoun and exclamation point. The figures for these can be seen in Table 7.

Table 7. Figures for the number of subject lines which included the combinations of the exclamation point category and the parts-of-speech categories.

4.5 Combining categories: Customer named

The customer named category, which had a positive correlation with successful ORs when the categories were analyzed on their own, is similar to the exclamation point category in that the inclusion of the name of a recipient has a positive effect on ORs when combined with the parts of speech categories. For every subject line that has either a noun, verb, adjective, adverb or pronoun, the average open rate is higher when the subject line also includes a customer’s name. This is even stronger for subject lines which have a customer’s name, but do not have a corresponding part of speech, except for the category pronoun, which has a higher average OR when it is used with a customer’s name than when a customer’s name appears without a pronoun. The figures are shown in Table 8.

Table 8. Figures for the number of subject lines which included the combination of the customer named category and the parts-of-speech categories.

The other categories are not as straightforward when they are used with the customer named category. Subject lines which use at least one exclamation point and name a customer have a higher average OR than subject lines which have neither or only one of these features. The same is true for the non-standard variation category. With the product named category, on the other hand, subject lines which do not have this feature but do name a customer have a higher average OR (14.26%) than those which do name a product but not a customer (11.22%), as well as those which name both a product and customer (13.93%), and those which do neither (11.82%). In comparing these categories to the customer named category, the combinations which have the highest average ORs are non-standard variation + customer named (15.20%) and customer named + no numeral (15.05%). The figures can be seen in Table 9.

Table 9. Figures for the number of subject lines which included the combination of the customer named category and the non-parts of speech categories.

5. Discussion

The most interesting aspect of the analysis is how little the use of parts-of-speech mattered in the success of the subject lines. As mentioned above, marketing guides sometimes perform a lexical analysis of which word types are found in successful subject lines. They then extrapolate their findings to the parts of speech that these word types are examples of and offer advice about which parts of speech should be in subject lines. If nothing else, these kinds of analyses show how invested marketers are in figuring out what works in email marketing. The results presented here, however, show that marketers might not need to spend so much time in crafting their subject lines to include a noun, or a verb, or any other of the parts of speech in the analysis here. Of course, deciding to use a specific part of speech in a subject line is only one of the ways that email marketers try to persuade customers to open an email, but it would seem that the other variables in this analysis are more worthy of a marketer’s time and attention. It is unclear on what grounds marketing guides base their claims about which parts of speech to use when this seems to be an unsuccessful practice, but it could be that the claims are just not specific enough. For example, as the analysis showed, most email marketing subject lines will include one of the major parts of speech. But each of these parts of speech can be further divided into more specific categories. So perhaps claims should not say which major parts of speech to use in successful subject lines (noun, verb, adjective, etc.), but should be more descriptive and say which specific type of these major parts of speech to use (plural common noun, -ing participle of lexical verb, superlative adjective, etc.). Looking only at the major parts of speech does not seem to be enough.

While it was not surprising that most of the subject lines included nouns, it is interesting that over 65% of the subject lines do not include a verb since email marketing guides tend to proclaim the importance of writing subject lines with verbs over other parts of speech (see Lewis 2002, Oberlin 2004, Phrasee 2015). [6] This could have something to do with how subject lines in CMC have been found to be used to contextualize the body of emails and so perhaps verbs are not as good as nouns at accomplishing this. On the other hand, the results show that subject lines which include a verb have a lower average OR than those which do not include a verb, so this study contradicts the prevailing wisdom in email marketing guides. [7]

Cho (2010) claims that the omission of pronouns is characteristic of the linguistic economy that is common in CMC. The omission of certain lexico-grammatical features in CMC comes from temporal constraints that are placed on the writers. Both linguists researching CMC and email marketers agree that communication in email is more fast-paced and brief than in other genres (see, for example, Dürscheid & Frehner 2013; Bly & Kelly 2009; Jenkins 2009; Waldvogel 2007). While the overwhelming majority of subject lines did not include a pronoun, I would be hesitant to argue that email marketers feel the need be linguistically economical in the same way that writers of other types of email do. Instead, I would say that email marketers are astutely aware of the ways that other types of email communication work and that they are simply adopting the norms of those texts. Borrowing features from other genres is, after all, one of the things that marketing has been shown to do (Cook 1994; Bhatia 2004). [8] In addition, email marketers are aware that if they are not economical then their subject lines will be clipped in the display window of email clients (such as Outlook, Gmail, and Yahoo!). This fact also explains why few of the subject lines exceed 50 characters. On the other hand, pronouns are not necessarily detrimental to the success of a marketing email. As the analysis showed, subject lines which included a pronoun and an exclamation point had a higher average OR than those which had either just one or none of these features. Clearly the use of pronouns in email marketing is not as straightforward as it is in other forms of CMC.

Exclamation points are another interesting variable since the linguistic literature on their use in CMC suggests either flaming or indicating friendliness. Turnage (2008) discusses how using multiple exclamation points or using them in combination with all caps may cause recipients of the message to perceive conflict or flaming on the part of the writer. Waseleski (2006: 1020), on the other hand, shows that exclamation points often function as “markers of friendly interaction” in online discussion groups related to professions. It makes intuitive sense that marketers are not using exclamation points to flame, but perhaps the high frequency of exclamation points in this corpus can be seen as another dimension of excitability, one which will not only indicate a friendly attitude, but also get a potential customer excited enough to open the marketing email. The marketing guides for their part are interestingly split on whether exclamation points should be used. Some see them as signaling “overexcitement” (Bly & Kelly 2009: 20), while others say that “using many exclamation marks to drive home a point can be effective” (Phrasee 2015). And despite marketing guides which advise against using exclamation points, almost 75% of the subject lines (or 1,486) in this corpus use at least one exclamation point. Nevertheless, 1,478 of the subject lines use one exclamation point after a phrase and these subject lines tended to have higher average ORs than ones which did not use exclamation points. Unfortunately, there are only 8 subject lines which used excessive exclamation points after a phrase, so although their average ORs were higher than the overall average, it would be hard to claim anything definitive about them. It is interesting, however, that when exclamation points were combined with pronouns, the average OR was higher than with any other part of speech.

The average open rate was highest for exclamation points when they were combined with the customer named category. This use of a person’s name, however, is an area where linguistic research on CMC is lacking compared to marketing research. All of the marketing guides researched for this study claimed that automatically appending a customer’s name would have a positive impact on email marketing, although this advice is sometimes combined with general advice on using pronouns as a form of personalization. The linguistic research on marketing does show that marketers try to be personal to their potential customers (Myers 1994; Goddard 1998; Cheung 2008) and marketing guides stress that email marketing allows for the greatest possibility of personalizing messages (Jenkins 2009). One guide (Simone 2009) even makes direct reference to the fact that this personalization in email marketing is insincere but shows why it is used by telling marketers, “You’re not trying to fool anyone that this was an individually typed message for that recipient, but you are trying to create the same feeling of personal relationship”. It is interesting, however, that less than 24% of subject lines in this corpus include a customer’s name and that less than 12% include a pronoun, given that these features led to higher average ORs.

Finally, the non-standard variation category needs to be discussed, as this is a feature which has greatly interested linguists researching CMC. Frehner (2008) notes that in CMC research it is difficult to tell whether a misspelling or other non-standard usage is actually a typo or whether it is an intentional choice by the writer. I would argue that nearly all of the language in email marketing – and certainly all of the language in my corpus – is intentional. That is to say apparent mistakes and typos are always intentional, as all of the subject lines would have been proofread and edited before they were sent. So we have to conclude that the number of actual typos or mistakes is vanishingly slim. This means that email marketing is different to other forms of CMC and email communication, where the practice is to send messages without editing them at all (Baron 2000; Frehner 2008).

On the other hand, based on the literature on CMC, I would have expected more of the subject lines include non-standard variations. While many researchers point out that the use of all caps is seen as shouting, and is therefore discouraged (Krohn 2004, Turnage 2008), others note that even though this may be true, the use of all caps is a “legitimate means of emphasis” (Frehner 2008: 50) in CMC and that the feature has already become established, despite what email etiquette may recommend (Dürscheid & Frehner 2013). Thurlow (2001: 288), however, has claimed that CMC makes “minimal to no use of capitalization” and Frehner’s (2008: 51) corpus supports this by showing a clear “tendency to minimize capitalization”. And yet, even though these non-standard variations and seven others were analyzed, only 52.30% of the subject lines included at least one form of non-standard variation. I am therefore hesitant to call this a tendency to use non-standard variations in the same ways that other types of CMC do, especially since only 29 of the subject lines are written in no caps. I would argue that this lack of non-standard variation is a product of the understanding between email marketers and their customers. If non-standard features are used in subject lines to draw readers’ attention and get them to open the email, then perhaps the lack of non-standard features is due to the fact that the readers have opted in to receive the emails. That is, they have already shown an interest in the company/product and so marketers know they do not need to use more cognitively complex features (such as non-standard variations) to entice readers to open the email. The way that email marketing uses standard capitalization is therefore important in two respects. First, it goes against what has been found in research of other of email communication, but is perhaps quite natural given the circumstances between marketers and customers. Second, it is a clear case of how more research in email marketing is needed in order to gain an accurate account of the linguistic features of email communication. In terms of capitalization, the emails in my corpus seem to be following Bly & Kelly’s (2009: 26) advice that non-standard variations are detrimental to success in email marketing.

6. Conclusion

This paper looked at email marketing subject lines, which is an aspect of CMC which has been overlooked, and open rates in email marketing. The results showed that features such as non-standard variations are used differently in email marketing than other types of email. The results also showed that, contrary to some marketing guides, parts of speech do not seem to be a factor in the success of an email marketing subject line. Future research could include other success metrics and it could match the subject lines with their email body texts. Future research should also describe the linguistic features of email marketing in the same way that other types of email, CMC language and marketing have been described in order to show the similarities and differences between these related genres. The analysis here suggests that linguistic research into email marketing can have practical benefits for both the field of linguistics by expanding the scope of research into email communication and the field of marketing by offering new ways to analyze marketing data. Indeed, whether or not personal email communication is being replaced social media, email marketing is only growing more popular and prolific in CMC.

[1] Email marketing is the term that is overwhelmingly favored for this practice. For example, in April 2016, there were 4,975 hits for “email marketing” in the Books section of Amazon.com and only 81 hits for “email advertising” (and even some of those books have “email marketing” in their titles). Similarly, there are 3,578 hits for “email marketing” in the Corpus of Global Web-Based English (Davies 2013), but only 58 hits in the corpus for “email advertising”. Whereas scholars have used the term advertising for other forms of the practice, the two terms are treated as synonymous in this study and the term marketing is used throughout. [Go back up]

[2] The accuracy rate for my subject line corpus was similar and was checked by close readings of the tagged subject lines. For example, open in a subject line like “Open for $5 off!” was sometimes tagged as an adjective rather than a verb. But the error rate for these mistags was about 3% and the mistags were manually corrected before the analysis was done. [Go back up]

[3] In linguistic literature, category names, such as parts of speech, are often written in small caps. This paper follows that practice. [Go back up]

[4] The non-standard variations were by necessity manually recorded, since CLAWS does not have a non-standard variety tag. CLAWS tagged E-V-E-R as a general adverb (_RR), but was less successful with some other subject lines. For example, off was in many cases tagged as a general preposition (_II) rather than an adverb in subject lines such as 90% off and these had to be manually changed. Subject lines like C L E A R A N C E - S A L E ! were manually recorded as including a noun, even though CLAWS tagged each letter in the subject line with _ZZ1, its tag for singular letters of the alphabet. [Go back up]

[5] It of course needs to be mentioned that Lewis’ claim about adverbs and adjectives is 15 years old at the time of this writing. That is a very long time in regards to the conventions of CMC. [Go back up]

[6] It should be noted that the linguistic advice of marketing guides is sometimes unclear. For example, they may recommend using “experiential verbs” (Phrasee 2015) or “power verbs” (Lewis 2002), but they do not offer any definitions of these terms, only a few examples. [Go back up]

[7] In a similar comparison of what language guides recommend and what people actually do, Bjørge (2012) analyzed the practices of ELF speakers in expressing disagreement in business meetings and the advice given by business English textbooks on how to do this. In this study, discrepancies emerged between actual usage and recommend practices. [Go back up]

[8] Marketing emails are a distinct genre from personal emails and from professional (business) emails. The most obvious reason is that marketing emails have a clearly different goal than other types of email. Depending on how these texts are approached, though, they could be called “sub-genres” of the larger email genre, even though research of personal email and professional email tends to treat these two as separate genres. Another reason that marketing emails are a separate genre is the way that TV commercials are a distinct genre from TV shows, newspaper advertisements are a distinct genre from news articles, etc. They may appear in the same medium and even have the same audience, but they have different goals and the audience recognizes them for what they are. Finally, the phrase “other genres” here applies to other non-email genres. Other forms of marketing have been shown to borrow from other genres, so it would make sense that email marketing would also borrow from other genres. [Go back up]


