Developing historical corpus pragmatics towards multimodality

Irma Taavitsainen and Carla Suhr
Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki

Historical pragmatics is an empirical field of study focusing on language use and meaning-making practices in past contexts. The aim of historical pragmatics is to discover and describe patterns of pragmatic language use with the ultimate aim of finding explanations to both synchronic variation in past periods and to changes in a diachronic perspective (Taavitsainen 2012). The sociohistorical context is essential for reaching these ambitious goals. Texts can be related  to their authors and audiences, writers and readers, and various aspects can be highlighted according to the research questions. Texts can be anchored to background sociohistorical developments such as changes in the media, literacy practices, and underlying intellectual commitments as they reflect on the linguistic features of language use. Another recent definition of historical pragmatics states that it studies “patterns of human interaction (as determined by the conditions of society) of earlier periods, the historical developments of these patterns, and the general principles underlying such developments” (Jucker 2008: 895). This definition is even more explicitly tied to the broad European view of pragmatics, according to which the sociohistorical context is necessarily taken into account in the analysis and there is no need for a separate label “sociopragmatics” (Culpeper 2009: 179). This branch is also called the “perspective view” as it holds that all language use can be viewed with pragmatic lenses. Intonation is important in conveying meanings in spoken language, but likewise punctuation in written language, especially in historical texts, can alter meanings. Morphology has not been much discussed from the pragmatic angle, but there is potential on this level as well. Syntax has received more attention, and the interface between semantics and pragmatics is increasingly prominent in research. The concept of context is important in pragmatics with several different aspects to be taken into account. We can proceed from the narrow linguistic cotext to the contexts of the situation of writing, activity type and genre, from microlevel analysis to the macrolevel of cultural context of the time and place.  The core challenge posed to the historical pragmatician is to examine how meanings are negotiated in various contexts and how the background factors influence these practices in a subtle way (Taavitsainen and Fitzmaurice 2007: 26).

In its short history, the field of historical pragmatics has made significant advances in research methods and data treatment. Our knowledge of usage patterns and their changes in historical varieties of English has increased a great deal. Studies in the field have profited significantly from the compilation of new historical corpora and methodological advances in modern corpus linguistics. Search techniques, corpus linguistic tools, and our understanding of how to apply corpus methodologies have improved to such an extent that today researchers can tackle research questions that could not have been asked until very recently. The first historical corpus, the Helsinki Corpus of English Texts (HC hereafter) was launched in 1991. The two decades have brought us to a new phase in the development of historical corpus linguistics, but for historical pragmatics the story is even more radical, as the field itself has emerged after the launch of HC. Scholars were inspired by the new tool from the beginning, and already in the inaugural volume Historical Pragmatics (ed. by Jucker 1995), several contributions were based on HC and its offsprings. At the early phases, many scholars were satisfied with descriptions of occurrences, but the goals of historical pragmatics have become more ambitious. These pilot studies paved the way to more comprehensive assessments later, and the present-day success of the historical pragmatic approach is largely due to the availability of historical corpora and new corpus tools that offer possibilities for more demanding research tasks. Often the analysis proceeds from the microlevel of the narrow linguistic context of an utterance to discourse and genre, and all the way to the large context of culture to account for the variability of language. In all cases, contextualisation and qualitative methods are needed to complement frequency studies.

We are at a dynamic phase in corpus development, but developing corpus methodology for pragmatic studies is not easy, and researchers are testing new ways of extending the limits of historical pragmatics to various directions. One such testing ground is provided by this volume, as electronic publishing makes it possible to integrate additional information to the conventional presentation of text. Innovations can be found in links to sources, video clips and pictures as readers can easily move from one level to another. Resources offered by electronic publishing have not been exploited to the full yet, and there is room for further development. The first steps towards multimodality are, however, taken with this volume, but more radical exploitation of hyperlinking lies in the future.

Multimodality deals with common semiotic principles that operate in and across different modes: e.g. in “monomodal” presentations music is often associated with emotions, and verbal narratives or pictures with action, but it would be equally possible for music to present action and for images to present emotion (Kress and van Leeuwen 2001: 2). Electronic media for publishing research articles could integrate multimodal elements to provide context and enhance insights, even arguments. In this volume we have made an attempt to make use of the new possibilities, and we tried to encourage authors to innovate with the digital medium, but copyright issues pose problems as permissions may be costly and difficult to acquire.  The following list shows the range of hyperlink contents in the articles:

  1. Hyperlinks to sources and illustrative websites (all articles)
  2. CoRD background information of the corpora (all articles)
  3. Author’s homepages (all articles)
  4. Videoclip of the plenary talk  (Jucker et al.)
  5. Query syntax for electronic searches (Jucker et al.)
  6. Longer extracts (Kohnen)
  7. Images or links to websites illustrating the medium of communication.The range of illustrations is limited as the copyright issues apply most to this category. We enclose  two “generic” pictures below to illustrate what could be done in principle. The manuscript is a high quality surgical text by Guy de  Chauliac, Takamiya Manuscript 59 in Tokio (photo Irma Taavitsainen). The early printed book is The Practice of Physic by Lazarus Riverius from 1668 in the Wellcome Medical Library in London (photo Ville Marttila). [1]

At present, an increasing number of corpora provide research opportunities for historical pragmatic research questions, and present-day corpora offer points of comparison in more recent language use. Pragmatic and semantic annotation is being developed, and they are becoming a desideratum for e.g. dialogic studies of interaction. Sociolinguistic parameters are helpful, as they give more precise information about the authors and their audiences by specifying e.g. the author’s age, education and gender, the regional origin of the writing, circulation and publishing facts, to name some relevant parameters. In an ideal case this information is encoded in a corpus and readily available to the researcher. Anchoring language varieties to their users, whether historical or contemporary, is the core task in sociolinguistic studies, but these facts are also needed for pragmatic assessments, as they are essential for contextualizing language practices in society.

Guy de  Chauliac, Takamiya Manuscript 59

Figure 1. Guy de  Chauliac, Takamiya Manuscript 59. Photo Irma Taavitsainen.

We have general purpose multigenre multifunctional corpora that aim at giving an overview of the occurrences of linguistic features across large varieties of corpus genres.  HC was the first multigenre, multipurpose historical corpus with accompanying background information. It was compiled in the 80s and launched for public use in 1991. Parsed versions of the three periods covered by the HC were published about a decade later (YCOE 2003, PPCME2 2000, PPCEME2 2004), and a TEI-compliant XML-version of the HC was distributed to participants of the Helsinki Corpus Festival in celebration of the corpus’s 20th anniversary. Another multigenre historical corpus is A Representative Corpus of Historical English Registers (ARCHER); it picks up from where the HC ends and extends to the modern period. Recent additions to multigenre and multifunctional corpora are the Corpus of Historical American English (COHA), a new megacorpus of four hundred million running words that focuses on the history of one variety of English, and Google Books, a collection of books printed in English from the seventeenth century onwards. The two new additions to historical corpora are not structured corpora, but they have the advantage of size, which earlier historical corpora lack, and can be used to illustrate megatrends.

The HC has become an indispensable tool for diachronic studies, but the way it is used has changed. At present HC is considered particularly useful as a diagnostic tool giving general information of the occurrence of forms, lexemes and structures in different periods of English. It has become almost a rule that historical linguistic studies take HC as their point of departure. The practice of starting with HC and proceeding to other corpora can be applied to several historical research tasks in sociolinguistics and pragmatics. Updating the format of the corpus into XML opens up the corpus to new users and new ways of using it, all the while demonstrating its continued value and attraction as a tool for linguistic research of past periods.

After the launch of the HC, historical corpus compilation was continued in second-generation specialised historical corpora. These corpora focus on particular genres, domains, or media of writing, sometimes described as “thick” in contrast to their “thin” representation in HC (Kohnen 2007).  Genre-specific corpora are constructed for certain purposes and research tasks, and they can be combined to give a more comprehensive picture of usage patterns across registers, for instance. Stratified corpora consist of carefully selected, comparable samples, and they facilitate the interpretation of results by making them more directly comparable to one another. Examples of second-generation corpora compiled in Helsinki are the Corpus of Early English Correspondence (CEEC), which focuses on private letters from the late fifteenth century onwards; the Corpus of Early English Medical Writing (CEEMW), which gives a comprehensive coverage of medical writing from the late medieval period over several centuries; and the Helsinki Corpus of Older Scots (1450–1700), which focuses on a regional variety. The Corpus of English Dialogues (CED) is noteworthy for historical pragmaticians, for it is designed especially for historical pragmatic research questions.  A list of historical corpora can be found here.

Dictionaries provide rich data for studies of Old and ME respectively (see the appendix), and the electronic Oxford English Dictionary alone covers the long diachrony and can be used as a historical corpus.  The Historical Thesaurus of English was published in 2009 and went online in 2010.  Thanks to compilers of these tools, more accurate lines of development are emerging and new avenues of historical semantics and pragmatics are opening up.

Not only is new material made available to researchers in the form of second-generation historical corpora and other tools, but corpus annotation is becoming more sophisticated.  Tagging can be added to identify parts of speech, and parsing to locate grammatical structures (e.g. YCOE, PPCME2, PPCEME2, CEEC). Sociolinguistic coding may be added (e.g. CEEC, CED). Semantic and pragmatic annotation of corpora is being developed (e.g. CED), and even layout features of texts can be integrated into corpus annotation (e.g. the Lampeter Corpus indicates switches in typeface). Finally, tools such as Variant Detector (VARD) enable automatic normalization of historical texts, where endemic spelling variation complicates computer-aided analyses of language use.

Almost all of the corpora listed above – and some text collections that are not included in the list – have been used in the studies presented in this volume. The papers gathered in this volume range from theoretical discussion to more traditional case studies, but what they have in common is a novel approach to materials or methodology. The papers have been organized thematically so that they move from the macrolevel of broader issues such as methodologies and concepts as research questions to the microlevel of case studies. The first three contributions deal with pragmatics and semantics of past lexical items and language forms and past language use. The next four articles show how different levels of language use can be viewed with pragmatic lenses. They illustrate the “perspective view” of pragmatics. The last article of the collection tackles a very different corpus linguistic dilemma, and suggests a constructive solution to combining various corpora for historical pragmatic research tasks in the future.

The paper by Andreas H. Jucker, Irma Taavitsainen & Gerold Schneider sets out to test how people’s own views of politeness, P1 or first-order politeness, can be investigated diachronically by metacommunicative expression analysis. This is not an easy task, for, as they note, politeness and impoliteness are manifested on many different levels, and they are situational and individualistic. Their solution is to focus on expressions that are used to talk about politeness in historical texts; their method is thus akin to modern ethnographic methods. Their methodology starts by identifying terms related to the semantic fields of courtesy and politeness, using the Historical Thesaurus of the Oxford English Dictionary as a tool, and then assessing how the meanings and contexts of these terms change over time in the Helsinki Corpus.

Claudia Claridge addresses the difficulties of studying the concept of hyperbole in Old English, and how they can be overcome. Her paper is an example of how historical pragmaticians are no longer afraid to tackle research questions that present challenges not only with finding suitable methods but also with finding and analyzing materials. Previous studies of hyperbole can be used to identify semantic fields likely to contain exaggeration, but extending the scope of research to the distant past brings up new issues. Nonetheless, Claridge shows that a pragmatician’s emphasis on context and qualitative analysis allows her to draw some conclusions about the ways in which hyperbole was used by the Anglo-Saxons.

Irony is another difficult concept to identify in historical materials, but this is what Graham Williams sets out to do in his paper. Previous studies of Middle English texts have identified irony through close qualitative readings of texts, but Williams seeks to show how corpora can be used to facilitate the study of concepts such as irony and the negotiation of meanings and their reversals in texts. In the Middle English period that his study focuses on, the modern words “irony” and “sarcasm” were not yet used, so his first task was to identify contemporary terms that he could then search for in the Corpus of Middle English Prose and Verse. His analysis shows how appropriate lexical searches of corpora can be used to locate verbal irony in texts, and to extend the scope of the analysis of verbal irony to questions of functions, genres, and sociolinguistic factors.

From new methodologies we move to new materials for research that brings historical pragmatics to close proximity with textual criticism. The article by Beatrix Busse and Ulrich Busse discusses the usefulness of comparing the several early modern editions of some of Shakespeare’s plays to gain insights into the use of pragmatic phenomena such as discourse markers in early modern English. Their article demonstrates how punctuation can alter meanings and how editorial decisions influence our readings of historical texts. They argue that editorial changes of discourse markers reveal clues about language at particular historical moments, and show in their pilot study how such a study could be undertaken.

The following three papers in the collection are case studies of specific linguistic constructions or expressions. The first one, by Jacek Kozlowski, investigates the functional and pragmatic differences of the word pair “quiten”/“aquiten” in Middle English. His findings suggest that the simplex form “quiten” tends to index emotional affect, and could therefore have a more interpersonal function than the complex “aquiten”. However, Kozlowski’s study also makes a methodological contribution: he proposes that studying simplex and complex word pairs in context may be a more fruitful endeavor than focusing on a prefix and the variety of verbs it can be attached to. This study provides a historical pragmatic study of a morphological feature to express additional shades of meaning.

The  Practice of Physic by Lazarus Riverius

Figure 2. The Practice of Physic by Lazarus Riverius from 1668 in the Wellcome Medical Library in London. Photo Ville Marttila.

Gabriella Mazzon makes an in-depth analysis of the discourse marker “I’m afraid”, and traces its development from Middle English to Present-Day English. She discusses how different functions and syntactic positions for the marker emerge over time, and relates her findings to recent theories for the development of discourse markers. Thus Mazzon tests the applicability of theoretical models of pragmatic syntactic developments by a systematic analysis of over a dozen corpora and text collections.

Raisa Oinonen brings us back to the concept of politeness that begins this volume, but from the technical view point of P2 or second-order politeness. She conducts a corpus-linguistic and pragmatic study of a subscription formula that was popular in sixteenth- and seventeenth century letters. She analyses first the diachronic development of what she calls the “subscription infinitive formula”, and then how considerations of politeness influenced its use. The study considers the politeness of the construction through its lexical items, the social patterns of its use, and where the subscriptions figure on a continuum of positive and negative politeness.

The final paper deals explicitly with structural aspects of corpora and how diverse materials could best be used for historical pragmatic research. Thomas Kohnen deals with a dilemma new to historical linguists: whereas previously there was too little material, today there is almost too much of it. Or rather, the material is spread widely in many different kinds of corpora that represent various domains of language use. He suggests in his paper a way of making the multitude of corpora more comparable by annotating them with metadata about text structure and the position of texts within their discourse communities, all of which may influence the linguistic structures found in texts. This would allow researchers to link parts of various corpora to form more powerful databases for their specific research questions.

As a whole, this volume shows how historical corpus pragmatics has developed over its short history, and points to where the field is headed. There are new materials available to researchers in the forms of new corpora, both small and task-specific as well as large text collections. But historical pragmaticians are also making use of previously existing materials that have not been used for pragmatic studies before. Expanding the range of materials brings new insights into pragmatic analyses of language use, not only because more material gives a more rounded view of a chosen linguistic phenomenon, but because new kinds of materials (whether they be new corpora or existing materials new to pragmatic analyses) encourage new analytical methods and new research questions. The scope of research questions can be extended from specific linguistic forms to semantic concepts. The articles demonstrate the vibrant and dynamic state of the field of historical corpus pragmatics, and promise new avenues of research in the future.

Notes

[1] With the exception of one link (to the Luminarium website), all links are to various parts of the British Library's website.

Bibliography:

Culpeper, Jonathan. 2009. “Historical sociopragmatics: An introduction”. Journal of Historical Pragmatics 10(2): 179–186.

Jucker, Andreas H., ed. 1995. Historical Pragmatics. Pragmatic Developments in the History of English. Amsterdam: Benjamins.

Jucker, Andreas H. 2008. “Historical pragmatics.” Language and Linguistics Compass, 2(5): 894–906.

Kohnen, Thomas. 2007. “From Helsinki through the centuries: the design and development of English diachronic corpora”. Towards Multimedia in Corpus StudiesStudies in Variation, ed. by Päivi Pahta, Irma Taavitsainen, Terttu Nevalainen & Jukka Tyrkkö, Contacts and Change in English. Volume 2.

Kress, Gunther  & Theo van Leeuwen. 2001. Multimodal discourse: The modes and media of contemporary communication. London: Arnold.

Taavitsainen, Irma. 2012. “Historical pragmatics”. Handbook of Historical Linguistics, ed. by Laurel Brinton & Alex Bergs. Berlin & New York: Mouton de Gruyter.

Taavitsainen, Irma & Susan Fitzmaurice. 2007. “Historical pragmatics: What it is and how to do it?” Methods in Historical Pragmatics, ed. by Susan Fitzmaurice & Irma Taavitsainen. Berlin & New York: Mouton de Gruyter. 11–36.