Introduction
		    Anneli Meurman-Solin and Jukka Tyrkkö 
		
            This volume aims to represent a useful and necessary  evaluation of the current state of the art when it comes to a corpus linguistic  and philological perspective to principles and practices of digital editing. While  it offers a window into the evolution of scholarly perspectives in the Research  Unit for Variation, Contacts and Change in English (VARIENG) in Helsinki over  the last two decades, it also reports on work by highly experienced corpus  compilers in some other research communities. Research progresses in ebbs and  flows, and it appears that the time has once again come for paratextual  features to be included in the study of linguistic. Today, the research  questions in this branch of study can be operationalised in terms of searchable  metadata and detailed annotation and taxonomisation of visual features,  available for large-scale diachronic and synchronic studies like never before.  At the same time, however, it is prudent to keep in mind that the new  methodologies are best used in a theoretically and methodologically  well-documented and transparent fashion. The articles in this volume contribute  to that end, highlighting some of the principles the authors and editors have  come to consider useful and the practical applications of those principles in  digital editing. 
            The introduction is structured into four parts: Part I (Meurman-Solin) describes the shared goals of the articles on three major  corpus projects at the VARIENG research unit; Part II (Meurman-Solin) focuses  on the studies which examine a variety of such visual features of historical  manuscripts the annotation of which can provide relevant evidence for  linguistic analysis; Part III (Tyrkkö) discusses paratextual properties of  early printed texts ranging from title-pages to texts representing quite  distinctive genre traditions; and Part IV (Tyrkkö) introduces both established and recently launched projects which  aim to develop new principles and practices for the annotation of the material  features of historical texts in a digitally retrievable way. The shared goal in  observing the material features of historical texts is to identify those which  are significant in the analysis of linguistic features and therefore should be taxonomised  and annotated in databases primarily compiled for language research. 
            The articles of this volume have been written by  scholars who have personally compiled corpora containing ‘unconventional data’  (Beal, Corrigan & Moisl 2007) and are therefore thoroughly acquainted with  the complex problems that compilers have to solve in the digital editing and  annotation of original historical texts. While dealing with the large degree of  variation attested in such texts is a challenge as such, deciding what to include  in a digital edition is an equally intriguing problem, seeing, besides what is  traditionally seen as language, a wide range of non-linguistic features are  also immediately observable in examining any original manuscript or early printed  text. Information about these features has only randomly been provided in the  existing corpora (cf. however Claridge in this volume).  
            In the twenty-first century, historians have written  quite extensively about historical documents. For example as regards historical  correspondence, there is a lot of recent research on the so-called material  features of early letters (e.g., Bannet 2005, Barton & Hall 1999, Daybell 2001, 2012, Daybell & Gordon forthcoming, and Schneider 2005; for  information on the material readings of early modern culture, see also Daybell  & Hinds 2010). However, these studies rarely address the question what  linguistic significance their findings might have. Thus, Daybell (2012), the  most recent work, does not seem to be aware of the abundant research by  linguists in recent years, nor does he mention activities among corpus  linguists in the field his own research focuses on. Similarly, although  historical linguists working on language data derived from early printed books  make passing references to the books themselves, few afford any appreciable  amount of attention to the production circumstances of those books or to the  paratextual features of the printed copy. Thus, despite the recent emergence of  the cross-disciplinary field of digital humanities, it is safe to say that at  present the two research communities of linguistics and material text studies  are not yet in regular contact with each other, even though they draw on the  same data.  
            Among linguists, the cross-disciplinary approach has  been adopted somewhat more widely, as illustrated by studies such as  Fitzmaurice (2002), Nevala (2004), Nevalainen (2001), Nevalainen &  Raumolin-Brunberg (1996, 2003), Nurmi, Nevala & Palander-Collin (2009),  Sairio (2009), Suhr (2011), and Diemer (2012). Also worth mentioning are new  projects such as Paratext on the Page at the university of Turku. 
            The general aim of the volume is to remind historical  linguists of the complexity of historical data (see for example Lass 2004),  part of that complexity perhaps getting lost in the digitisation process. In  the linguistic analysis of historical texts we constantly draw on our knowledge  of the evolution of writing practices in the various grammars and genre  traditions, including how these are reflected in the material features of  writing and printing. The problem is that features reflecting the culture and  practices of writing in the original manuscripts and early printed texts are  either not reproduced at all in the corpora or remain unannotated in a way that  would make this information digitally retrievable. 
            Now that diachronic corpora provide us with large  quantities of data, errors resulting from the misinterpretation of  insufficiently contextualised linguistic items may be multiplied in the  analysis. In other words, if we edit digitally and annotate the language of texts exclusively and  reproduce quite imprecisely, or not at all, non-linguistic features such as  layout, script type, and contracted and abbreviated word-forms, the  consequences may be serious in both quantitative and qualitative analysis. We  think that, in improving diachronic corpora further, a much wider range of  material features should be described, taxonomised, and annotated and that  sophisticated tools should be developed for retrieving this information as  directly linked to linguistic items.     
            Since for example limitations of space or  technicalities related to the reproduction of historical documents as digital  images or the creation of hyperlinks to other online resources do not impose  restrictions on the authorial and editorial process, online publishing in the  e-series Studies in Variation, Contacts and Change in English provides us with  an excellent forum for offering a particularly rich variety of illustrations in  discussing the various topics. We hope that these illustrations will offer  useful examples not only for researchers but also for students in the field of  historical linguistics. 
            
	Part I: On the evolution of three major corpus projects in English historical linguistics
		  The syntheses of three major corpora compiled at the  VARIENG research unit at the University of Helsinki draw on the long-time  experience of their compilers, digitisers, and annotators, who specialise in creating  carefully structured databases representative in long diachrony. The general  goal of the articles is to describe what theories and methods the compilation  principles and practices are based on and how the compilers’ thinking and  technologies have developed over time. The leitmotif in all the articles is the  compilers’ awareness of the protean nature of corpora, continuous evaluation,  change, and expansion being part and parcel of corpus compilation processes. We  hope that the readers will find it useful to study the compilers’ own  assessment of how the original goals have been achieved, why particular changes  have been made, and what they think can be named as particular advantages or  disadvantages of a particular corpus. The readers are also reminded of matters  related to time and economy in corpus compilation projects. 
          The articles provide  information about the following properties of the databases: 
          
          - corpus type (e.g.  multi-genre/single genre; synchronic/diachronic);
  
 - size of the  database;
  
 - time period;
  
 - place or region the  data originates from;
  
 - genre(s) and text  type(s);
  
 - availability  (CD-ROM, online, restricted access, etc.) at present;
  
 - description of  where, by whom, and how the database has been used internationally;
  
 - research literature  based on the database.
  
          The reader is reminded of the fact that more detailed information  about the corpora is provided by the CoRD site http://www.helsinki.fi/varieng/CoRD/corpora/index.html 
          The critical  assessment of each corpus aims to provide information of the following kind: 
          
          - the aim of the  corpus project as defined when the compilation process began;
  
 - comments on this aim  and the way it relates to how the corpus has been used; 
  
 - an evaluation of the  present relevance of the database in general terms; 
  
 - an evaluation of the  representativeness of the database in more detail; a summary of the compilers’  views on where the caveats are in the database; the compilers may also report  on how scholars have assessed the representativeness of the database and what  other data sources in their view complement it appropriately;
  
 - comments on  comparability between the database and other corpora;
  
 - an evaluation of the  quality of the texts from the perspective of text history; 
  
 - an evaluation of the  authenticity of the texts (e.g., non-autograph letters or copies, instead of  original letters, reducing validity as data; in early printed works, the  printers’ policies and practices affecting the language, etc.);
  
 - an assessment of how  the language-external variables coded into the corpus have succeeded in guiding  the corpus users in their interpretation of their linguistic findings (e.g., in  corpora structured by language-external variables, there is the risk of  presenting claims about the conditioning of genre or gender, even though texts  representing a particular genre or female informants as a group may form internally  quite heterogeneous data categories);
  
 - an assessment of how  general practices in a particular tradition of writing may affect the data,  especially the balance required for a statistically valid account of salient  features (e.g., in using correspondence as data, it is necessary to take into  account such general practices as the widespread preference of secretary hand  in sixteenth-century formal letters written by members of the higher ranks and  professionals or the much earlier adoption of italic among circles close to the  royal court, since both of these practices have a major influence on the choice  of linguistic variants; 
  
 - information about  other corpora which usefully complement the database, especially those that  have been compiled in recent years; 
  
 - information about  forthcoming corpora, those compiled by colleagues in Helsinki or abroad; how these  relate to the earlier ones (e.g., the forthcoming one is directly comparable,  revised, larger, applies new compilation and digitisation principles, has been  improved as regards representativeness by region, fills a gap, is focused on a  particular genre, introduces a new annotation system); 
  
 - views on tasks for  future work in corpus linguistics and philological computing.
  
  
          The above structure for describing a corpus may  provide useful guidelines for introducing other corpus projects. 
          In their article Rissanen and Tyrkkö trace the  evolution of the Helsinki Corpus of  English Texts (c. 750–1700) from the pioneering diachronic corpus  structured by a wide range of language-external variables to its updated  xml-version, which came out in 2012. Nevala and Nurmi describe the CEEC family  of corpora, providing information about the ongoing expansion and annotation of  the databases of early English correspondence (1403–1800) and how thorough  knowledge of English social history has permitted the research group to create  a finely-graded taxonomy of geographically, demographically, and socially  relevant variables. Taavitsainen and Pahta give a thorough account of the  compilation principles and practices of the MEMT family of corpora of medical  texts (1315–1800), showing, for example, how studies based on these corpora  have permitted the redating of the development of scientific writing and the  understanding of how rich and multi-layered the history of medical writing is,  for example, as regards genre and register types. 
	Part II: Material features in manuscripts. Correspondence and trial proceedings in focus
         In discussing a range of non-linguistic features of  letter-writing such as materials and tools of letter-writing and the social  significance of layout practices and choice of script type, Daybell (2012: 2)  uses the umbrella concept of “the material rhetorics of the manuscript page”.  His work, based on the examination of over 10,000 manuscript letters (Daybell  2012: 85), highlights the importance of keeping in mind the social practices of  letter-writing that, for example, complicate matters related to such highly  relevant questions as the identification of authorship. The same writer may use  different script types, the social distance between the writer and the  addressee or the level of formality or other circumstances influencing the choice;  variables such as these also influence the layout of letters (Daybell 2012:  86–95) and, most importantly, their language.  
         While historians such as Daybell describe the material  and social circumstances of letter-writing in great detail, their conclusions  are usually not based on a statistical analysis of the findings, nor do they  suggest criteria for taxonomising variation they report on. For example, the  remarks on the positioning of the place and date of writing in letters are  general (see Daybell 2012: 104–105), rather than allow us to trace the  evolution in this particular practice from the position as part of the body of  the text at the end, to a separate position at the end, and finally to the  fixed position on the right at the beginning, a development recorded in the Corpus of Scottish Correspondence (Meurman-Solin b in this volume). Similarly, the weakening of the significance  of social signs and their replacement by conventionalised ways of structuring a  text according to genre-related expectations is usually not recorded by  historians in sufficient detail to permit us to identify patterns of variation  and record the pace and direction of change (cf. Nevala 2004, Sairio &  Nevala in this volume). 
         Beside the provision of statistically significant  evidence, a major difference between research conducted by historians and by  linguists at present is the fact that, among linguists, there is a vivid  interest in developing principles and practices for annotating paralinguistic  features, so that they can be retrieved in computer-assisted research. By  contrast, book historical and other non-linguistic philological scholarship is  often more concerned with discussing paratextual features separately from  texts, which means that the semantic and pragmatic effects of typography and  layout cannot be evaluated in a systematic way. 
         The articles discussing manuscript data in this volume  deal with the visual features of trial proceedings (Walker & Kytö) and  correspondence (Sairio & Nevala and Meurman-Solin a and b), all the four articles  drawing on the long-time experience of their authors in the transcription,  digitatisation, and study of historical manuscripts. Walker and Kytö describe  the layout features and visual effects in manuscripts recording depositions presented  in church court and criminal court cases, edited digitally and annotated by the Electronic  Text Edition of Depositions 1560–1760 (ETED) team. These  recorded deponents’ testimonies represent the various regions of England. The  authors have written widely on the texts of the ETED corpus (see the references of this article); as regards the recording procedures of trials applied by  scribes, see also Huber 2007. 
         Walker and Kytö state that the aim of the ETED project  was “to produce an edition that was faithful to the manuscript texts insofar as  this – within the scope of the project – was technically possible and  meaningful for linguistic study while also enabling the edition to function as  a searchable electronic corpus”. In their article they provide important  information about their principles and practices in the selection of layout and  other visual features they have decided to annotate. The readers will certainly  find it useful to study the two annotation systems, the use of particular  symbols (e.g., a tilde or special fonts used to reproduce characters indicating  abbreviations) and the use of editorial comments in angle brackets, the two  systems complementing each other very nicely. 
         There is less variation in the layout features of  church court depositions than the criminal court records, as the former were  preserved in bound books or bundles, and they were written down by fewer  scribes. Moreover, these scribes presumably benefited from model documents and  other instructional material. From the perspective of the goal of the present  volume, a particularly interesting finding is that, since depositions are  utilitarian records, the range of visual devices (for example, indexing and  organising information and highlighting the various components of depositions)  are different from those attested in correspondence (Sairio & Nevala and  Meurman-Solin a and b in this volume) and pamphlets (Claridge in this volume).  The illustrations will permit the comparison of the font changes and the use of  large and/or embellished characters in the depositions and in title-pages  representing other genres (McConchie and Ratia in this volume). 
         Sairio and Nevala focus on the influence of  letter-writing manuals on layout in a small selection of eighteenth-century  private letters and examine the social dimensions of the practices reflected,  for example, in the use of deferential space and the positioning of such  standard components of a letter as the place and time of writing and the  signature. Even though the manuals have played an important role in the  evolution of this particular genre (e.g., Mitchell & Poster 2007), the  authors record variability in how the rules are applied, pointing out education  and degree of formality as some of the factors conditioning the writing  practices. The norms may be ignored in letters where there is a close  relationship between the writer and the recipient. 
         The findings in Sairio & Nevala can be compared with  those in Meurman-Solin’s article on layout in the letters of the Corpus of Scottish Correspondence (CSC).  For example, while according to The Art of Letter-Writing (ALW) (1762), as commented by Sairio  & Nevala, “sending greetings in a postscript even to one’s friends might be  considered ‘Levity’, or have ‘the Appearance of having almost forgotten them’  (1762: 17),” shows “disrespect and indifference”, this practice is quite  frequent in the Scottish letters. In fact, that the conventions in letter-writing  change considerably over time is well evidenced by comparing the rules in ALW and the practices in the CSC letters  dating from 1500–1715. This is reflected, for example, in leaving a large space  between the body of the letter and the signature in numerous sixteenth- and  early seventeenth-century Scottish letters, a practice which is much less  frequent in late-seventeenth and eighteenth-century letters. The 2007 version  of the CSC is shown not to provide sufficient evidence of when and how the  structuring of the body of the text into paragraphs evolved, the ongoing  extension of the corpus perhaps solving this problem. 
         One of the central concerns in this volume is to  assess and discuss critically which visual features of manuscripts,  traditionally left unannotated, can be shown to be quite relevant in linguistic  analysis, and frequently even indispensable for producing a valid  interpretation of historical grammar and lexis. Meurman-Solin’s study of  features of visual prosody, such as punctuation devices, spacing, and marked  character shapes, in the CSC corpus highlights their importance in the  identification of structures of syntax and discourse. The study illustrates  normalisation and modernisation practices recorded in earlier editions and  compares them with the principles and practices of philological computing.  Modernisation was considered acceptable quite widely due to the fact that  correspondence was frequently published as part of family memoirs or similar  works, and the editors assumed that the great majority of their readers would  be either historians or other people interested in letters as historical  documents. Meurman-Solin shows that, using diplomatically transcribed original  manuscripts as data, also paying close attention to visually observable  devices, permits the scholar to write a new grammar of epistolary prose, in  principle, every time he or she reads a new idiolect. These idiosyncratic  grammars then form the basis for the understanding of how letters were written  in the sixteenth to eighteenth centuries as regards the sequencing of chunks of  discourse, the choice of connective devices, the ordering of communicative  acts, and the positioning of given and new information  (Meurman-Solin 2011, 2013). Further thoughts  on how various manuscript features affecting linguistic analysis can be  translated into categories are presented in Meurman-Solin’s article on  taxonomisation (Meurman-Solin c in this volume). 
         
         Part III:  Paratextual properties in early printed title-pages
         The so-called Printing Revolution started in Europe in  the mid to late fifteenth century and the rate at which the new medium was  adopted was prodigious, as is well-attested in book historical literature. In  England, the first printing press was set up in London by William Caxton in  1476. Over the sixteenth century the number of printing houses increased  quickly, as did the availability of books printed in English. The significance  of mass-produced vernacular text to literacy and to the development of English  is hard to overestimate. Not only did printing hasten the standardization of English  spelling (see Tyrkkö forthcoming), but it also introduced a rapidly growing  segment of society to what written text looks like. One element of this new  written standard was the title-page, a textual innovation of the late fifteenth  century described by book historians such as Elisabeth Eisenstein (1979: I,  106) as “the most significant new feature associated with the printed book  format”. 
         For the philologist, printed texts present a whole new  set of variables to consider. The challenges associated with the uncertain  genealogies of manuscripts are replaced by the equally complicated production  circumstances of the early modern marketplace of books. As discussed in detail  by authorities such as McKerrow (1967) and Bland (2010), the early modern  printing house was a hive of activity with craftsmen of various descriptions  working in concert to produce commercially viable products. Some aspects of the  printed book, such as the title-page and the illustrations, fell almost  completely into the remit of the printer and publisher (see Shevlin 1999: 57).  In recent years, more and more attention has been afforded to aspects of the  book-production process that have often gone unnoticed especially by historical  linguists. For example, increasing awareness of the role played by correctors  in shaping the language of printed books will undoubtedly affect studies of  Early Modern English that are based on printed sources (see Grafton 2011). The  role of the author, traditionally seen as unproblematic, is thus paradoxically  at once both central and curiously peripheral. For the linguist studying  historical varieties of language, the author naturally stands at the center of  interest with all his or her sociocultural, regional, and idiosyncratic  characteristics. Against that premise, to know that many of the features seen  on the page were in actual fact collaboratively, if not solely, produced by the  printing house, begs the question whether it is reasonable to interpret the  primary data available to us as representing only the author’s use of language.  This broader view of textual history challenges us to realise that only a  cross-disciplinary methodological approach may permit us to draw conclusions  about the language of texts of this kind. 
         While paleographers and philologists working on manuscripts  have always been keenly aware of the importance of paratextual features,  linguists working on printed primary data have typically tended to focus on  only the text itself. This may be in part due to the apparent simplicity of  printed text, a notion contested by all the relevant contributions in this  volume, or, paradoxically, by the complexities of their production  circumstances, which often make the correct attribution of responsibility for specific  features difficult. The common adoption of corpus linguistic methodologies in  historical linguistics has arguably exasperated the issue further by divorcing  the corpus edition from the original text, preserving only the basic level of linguistic  information and discarding all the features that are, or were, considered too  difficult or even impossible to replicate on the computer screen. Accordingly,  the two articles in Part III discuss the features of early printed title-pages,  focusing in particular on the importance of typography and layout to linguistic  and philological studies.  
         Even more so than choice of type, layout has been  largely ignored in the linguistic studies of early printed texts. The article  by McConchie begins by noting that while a vast amount of scholarship already  exists on virtually all aspects of the history of the book, collaboration  between bibliographers and historical linguists has not been particularly  far-reaching. To remedy this state of affairs, McConchie discusses the  significance of seemingly minute details of layout and typography using seven  early modern title-pages as evidence and reveals the wealth of information  hidden beneath the most obvious level of interpretation. The central concept  here is the illocutionary force of visual features, which McConchie argues is  evident in the use of a wide range of typographic features. The examples  provided include shifts in type which are shown sometimes to supersede linguistic  integrity, as in the case of proper names split into successive lines which  undergo a type shift, and the use of blank lines and point size to highlight an  author’s name. A digital edition that would omit such visual information would  invariably mislead the scholar. McConchie also raises the issue of emblems and  other graphic representations, showing that they, too, have great potential  significance in the overall analysis of the printed page. 
         One of the primary functions of the early modern title-page  was that of an advertisement. To attract potential customers and to communicate  clearly the topic of the book, printers would often follow formulaic and  genre-specific traditions in designing title-pages. The contribution by Ratia focuses  on the relationship between the overt message communicated by a title-page and  the genre of the book it represents, and asks whether this relationship was  always a straightforward one or whether specific features may have been used to  mislead the customer. For primary data of her case study, Ratia looks at a  corpus of 15 plague treatises printed in the seventeenth century. Plague  treatises form a distinct subgenre of early modern medical writing, and Ratia defines  her primary data further by only including books that appear to convey religious  overtones on the title-page. Using lexical choice and the size of type as  indicators of affective language use, Ratia’s analysis shows that despite  similarities, the texts belong to three distinct discursive types, only one of  which is genuinely religious. The study also demonstrates how visual and  textual prominence was given to specific lexical items for reason such as emphasis  and foregrounding, and that many of these practices were repetitive enough to  be regarded as genre-indicating features. For example, while medical key words  were afforded visual prominence, religious terms, although present in a  secondary role, were clearly downplayed. Ratia’s analysis at once points  attention to the significance of typographic and layout features in the comprehensive  analysis of the full linguistic sense of the title-page, and demonstrates how  such conclusions would be impossible to reach if the researcher did not have  access to the full paratext of the original books. 
         
         Part IV: New  approaches to digital editing
         In recent years, the emergent field of digital  humanities has served to bring closer the previously distinct disciplines of corpus  linguistics and digital editing. This has come about in part through shared  computational methodologies and in part by the realisation on both sides that  great benefits can be gained if unified methodologies are employed. The  articles in the fourth and final part of this volume present new approaches in  this area from the linguistic perspective.  
         One of the traditional concerns of manuscript studies  has been the interpretation of abbreviations, abundant as they were in medieval  scripts. The related matter of how best to handle such abbreviations in modern  editions is the topic of Honkapohja’s article, in which the author takes a  detailed look at the digital editing of medieval manuscripts. According to  Honkapohja, one of the central challenges in the digital editing of manuscripts  is that much of the scholarly terminology and, by extension, of the taxonomies  they represent, can be traced back to late nineteenth and early twentieth  centuries. The use of these concepts in modern, data-driven corpus linguistic  research is inherently problematic. As Honkapohja argues, “This requires ways  of representing the data in such a way that it can be quantified and used as  reliable evidence, and the traditional paleographical terminology is not  necessarily the optimal way for approaching it”. 
         The study addresses the many theoretical and practical  challenges posed by manuscript abbreviations. A historical overview of various  types of abbreviations is provided first, followed by a review of the  taxonomical treatment of abbreviations in standard reference works spanning 200  years and a discussion of the theoretical positions taken by editors and  scholars to the topic. In the second half of the paper Honkapohja presents his  own XML-based annotation system, based in part on the model developed by the Digital Editions for Corpus Linguistics project (see Honkapohja, Kaislaniemi & Marttila 2009). Honkapohja’s model  for annotating the Trinity Seven Planets manuscripts  is presented with considerable attention to detail, addressing both of the two  key issues identified by the author, the representation of the sign of abbreviation  and the abbreviated content. The model serves both purposes well thereby  extending the applicability of corpus linguistic methods to manuscripts. 
         There are  two different but closely related issues that researchers have to deal with  when it comes to paratextual features. The first of these is annotation itself, which in the most basic  sense refers to the descriptive metadata that will add new layers of searchable  and quantifiable data to the text itself. Examples of largely unproblematic  layers of annotation include metadata on the document itself and descriptive  metadata on the visual and material aspects of the original artifact. The more  controversial issue is taxonomisation,  by which we mean the division of descriptive data into groups based on  well-argued and clearly communicated categories. In corpus linguistics, the  latter task in particular has raised objections on the grounds that pre-determined  categories may impose particular interpretations and as such guide research  excessively. A classic example of a potentially controversial layer of  annotation would be part-of-speech tagging, which necessarily requires that the  tags are assigned following a specific taxonomy of word classes. On the other  hand, it is important to make a distinction between descriptive and  interpretative taxonomies, the former cataloguing distinct features of note,  the latter assigning them with a semantic, pragmatic, or linguistic  significance. 
         Meurman-Solin’s  article on the taxonomisation of features of visual prosody addresses the topic  by presenting a compelling set of arguments for why descriptive taxonomies of  visual features are useful. Based on extensive experience, the article presents  a context-independent theoretical framework for the underlying principles of  taxonomy which can be applied to any corpus-linguistic or digital editing  project that includes the visual component of the historical documents in  question. According to Meurmal-Solin, “The main challenge in annotating visual  prosody is how to taxonomise, that is, how to translate a large degree of  variation into retrievable variant types without losing information which is  relevant from the perspective of the perceived range of research topics”. An  important underlying principle of the taxonomical theory Meurman-Solin presents  is the claim that “an annotation taxonomy only functions as a tool in data  retrieval”. Discussing polarisation, frequency, and membership of a particular  discourse or text community as three key dimensions of taxonomy, Meurman-Solin  shows that the descriptive system used for visual paratext does not need to,  and indeed should not, extend to semantic interpretations of the features’  functions or meanings. 
         As with manuscript studies, one of the challenges in  large-scale empirical studies of early printed paratext has been the lack of systematic  descriptions that would allow scholars to access the data in a searchable and  quantifiable form. In book historical scholarship, the most natural approach  would be to compile a descriptive database such as the comparative database of  typographic features of Dutch handpress books compiled by Proot (2012), using  established features such as names of type and layout features as variables (or  fields) and physical measurements thereof as values. For the book historian,  the main point of interest would typically be whether or not a particular feature  appears in a book, while the historical linguists would almost invariably be  interested in the relationship between those features and the texts itself. As  noted at the beginning, such descriptive taxonomies or methods thereof have not  been adopted by historical linguists to any great measure. The last two  articles in this volume address this issue, presenting two approaches to the  corpus annotation of printed title-pages from the Early Modern period. 
         The Lampeter corpus, compiled by Josef Schmied,  Claudia Claridge and Rainer Siemund and released in 1999, was the first  linguistic corpus to feature sophisticated annotation of the paratextual  features of early printed texts. The Lampeter corpus was annotated in SGML  following the guidelines of the Text Encoding Initiative (TEI) and, in part,  developed in cooperation with members of TEI. Claridge’s article in this volume  discusses the linguistic relevance of the visual elements annotated into the  Lampeter corpus. Quoting Moxon’s Mechanick  Exercises (1677), famously the earliest account on the inner workings of an  early modern printing house, Claridge begins by noting the care and attention  that contemporary printers and publishers afforded to the visual presentation  of the printed book. Although such features are more accessible than ever  before through facsimile images, such images are often not sufficient for real  research tasks because the features of interest are not searchable nor,  consequently, easily quantifiable. The article discusses three separate but  closely-related topics, namely page layout, typography, and word separation. In  each case the author explains the practices followed in the Lampeter corpus and  the reasons behind them, and gives select examples of how the annotation aids  research and helps us preserve potentially significant features which more  conventional corpus editions typically omit. For example, Claridge provides  statistical evidence on the use of blackletter type in the Lampeter corpus. As  Claridge argues, the text-typological and diachronic variation observed not  only inform us about the overall use of a particular type, but also suggests  how salient words printed in that type would have been to the eye of the  contemporary reader.   
         The final article in the volume by Tyrkkö, Marttila  and Suhr introduces the work of the Gatekeepers  of Knowledge project and presents a pilot study that focuses on the title-pages  of books associated with one prolific seventeenth-century medical author,  Nicholas Culpeper (1616–1654). Culpeper was the first English  medical writer to become a bestselling author, and his books were printed  regularly for several decades after his early death. Taking as a premise that  the function of the early modern title-page was to serve as an advertisement,  the authors ask the question whether the many printers who profited from  Culpeper’s name, some legitimately and others by less scrupulous means, can be  seen to have created a recognisable style that would identify a book as a  Culpeper volume.  
         Following a brief historical background to Culpeper  and the associated printers and publishers, the article discusses two separate  issues: the annotation system developed for the Gatekeepers project, and a  small sampling of findings based on the annotated data. An adaptation of TEI P5  XML, the annotation scheme includes a number of new innovations including the  precise recording of type sizes and spacing in page layout, and the use of  identifier elements for persons and places that appear in the documents. As the  authors discuss, the analysis required that highly detailed measurements, down  to one fifth of a millimeter, were taken of each of the 100 title-pages. The  method is extensive in detail and, consequently, time-consuming to carry out in  practice, but the findings show that such high-level annotation can be useful  for some research tasks. In the spirit of open access, the authors make  available a copy of the project’s own annotation guidelines and the actual  corpus itself. Using quantitative and statistical evidence, the authors then demonstrate  how adequately annotated paratextual data can be queried and analysed with corpus  linguistic methods. The case study shows how different printing houses  developed and maintained specific styles, often from one generation to another,  but also how the Culpeper brand came to mean that certain initially  idiosyncratic features were established to such an extent that nearly all  printers made use of them.  
          
          References
          Bannet, Eve Tavor. 2005. Empire of Letters: Letter Manuals and Transatlantic Correspondence,  1680–1820. Cambridge: Cambridge  University Press. 
          
          Barton, David & Nigel Hall, eds. 1999. Letter-Writing as Social  Practice (Studies  in Written Language and Literacy 9). Amsterdam &  Philadelphia: Benjamins. 
          
          Beal, Joan C., Karen P. Corrigan & Hermann L. Moisl, eds. 2007. Creating and Digitizing  Language Corpora, Vol. 2: Diachronic  Databases. Basingstoke: Palgrave Macmillan. 
          
          Bland, Mark. 2010. A  Guide to Early Printed Books and Manuscripts. Malaysia: Wiley-Blackwell. 
          
          Daybell, James. 2001. Early Modern Women’s Letter-Writing in England, 1450–1700. Basingstoke: Palgrave Macmillan. 
          
          Daybell, James. 2012. The Material Letter in Early Modern England: Manuscript Letters and the  Culture and Practices of Letter-Writing, 1512–1635. Basingstoke: Palgrave Macmillan. 
          
          Daybell, James & Andrew Gordon, eds. Forthcoming. Cultures of Correspondence in Early  Modern Britain. 
          
          Daybell, James & Peter Hinds, eds. 2010. Material Readings of Early Modern Culture,  1580-1700. Basingstoke: Palgrave Macmillan. 
          
          Diemer, Stefan. 2012. “Orthographic annotation of Middle English Corpora”. Outposts of Historical Corpus Linguistics: From the Helsinki Corpus to  a Proliferation of Resources (Studies in Variation, Contacts and Change in English 10), ed. by Jukka Tyrkkö, Matti Kilpiö, Terttu Nevalainen & Matti Rissanen. Helsinki: Research Unit for Variation, Contacts, and Change in English. http://www.helsinki.fi/varieng/series/volumes/10/diemer/ 
          
          Eisenstein,  Elisabeth. 1979. The Printing Press as an Agent of Change.  2 volumes. Cambridge: Cambridge University Press. 
          
          Fitzmaurice, Susan. 2002. The Familiar  Letter in Early Modern English. Amsterdam & Philadelphia: Benjamins. 
          
          Grafton, Anthony. 2011. The  Culture of Correction in Renaissance Europe. London: The British Library. 
          
          Huber, Magnus. 2007. “The Old Bailey Proceedings,  1674-1834: Evaluating and annotating a corpus of 18th- and 19th-century spoken  English”. Annotating variation and change (Studies in Variation, Contacts and Change in English 1), ed. by Anneli Meurman-Solin & Arja Nurmi. Helsinki: Research Unit for Variation, Contacts, and Change in English. http://www.helsinki.fi/varieng/series/volumes/01/huber/ 
          
          Lass, Roger. 2004. “Ut custodiant litteras: Editions,  corpora and witnesshood”. Methods and  Data in English Historical Dialectology (Linguistic Insights: Studies in Language and Communication 16), ed. by Marina Dossena & Roger Lass, 21–48.  Bern: Peter Lang. 
          
          McKerrow,  Ronald B. 1967 [1927]. An Introduction to  Bibliography for Literary Students. Oxford: Clarendon Press. 
          
          Meurman-Solin,  Anneli. 2011. “Utterance-initial connective elements in early Scottish  epistolary prose”. Connectives in Synchrony and Diachrony in European Languages (Studies in  Variation, Contacts and Change in English 8), ed. by Anneli Meurman-Solin & Ursula Lenker. Helsinki: Research Unit for Variation, Contacts, and Change in English. http://www.helsinki.fi/varieng/series/volumes/08/meurman-solin/ 
          
          Meurman-Solin, Anneli. 2012. “The connectives and, for, but, and only as clause and discourse type indicators in 16th- and 17th-century epistolary prose”. Information Structure and Syntactic  Change in the History of English (Oxford Studies in the History of English 2), ed. by Anneli Meurman-Solin, María José López-Couso & Bettelou Los. New  York: Oxford University Press. 
          
          Mitchell, Linda C. & Carol Poster, eds. 2007. Letter-Writing Manuals and Instruction from  Antiquity to the Present: Historical and Bibliographic Studies. Columbia,  SC: University of South Carolina Press. 
          
          Moxon, Joseph. 1683. Mechanick Exercises, or, The Doctrine of handy works. Applied to the  Art of Printing. London: Printed for Joseph Moxon.  
          
          Nevala, Minna. 2004. Address in Early English Correspondence: Its Forms and Socio-Pragmatic  Functions (Mémoires de la Société Néophilologique de Helsinki LXIV).  Helsinki: Société Néophilologique. 
          
          Nevalainen, Terttu. 2001. “Continental conventions in early  English correspondence”. Towards a History of English as a History of  Genres, ed. by Hans-Jürgen  Diller and Manfred Görlach, 203–224. Heidelberg:  Universitätsverlag C. Winter.  
          
          Nevalainen,  Terttu & Helena Raumolin-Brunberg, eds. 1996. Sociolinguistics and Language  History: Studies Based on The Corpus of Early English Correspondence (Language and Computers 15). Amsterdam & Atlanta: Rodopi. 
          
          Nevalainen,  Terttu & Helena Raumolin-Brunberg. 2003. Historical  Sociolinguistics: Language Change in Tudor and Stuart England. (Longman  Linguistics Library). London: Longman. 
          
          Nurmi, Arja, Minna Nevala & Minna  Palander-Collin, eds. 2009. The Language of Daily Life in England (1400–1800) (Pragmatics and Beyond New Series 183). Amsterdam:  Benjamins. 
          
          Proot, Goran. 2012. “Towards a typographical atlas of the handpress book produced in the Southern Low Countries in the Early Modern period: Aims, methodology and results”. Conference paper presented on June 21,  2012 at SHARP 2012, Washington D.C.  
          
          Sairio, Anni. 2009. Language  and Letters of the Bluestocking Network: Sociolinguistic Issues in 18th-century  English (Mémoires de la Société Néophilologique de Helsinki LXXV).  Helsinki: Société Néophilologique. 
          
          Schneider, Gary. 2005. Culture of Epistolarity: Vernacular Letters and Letter Writing in Early  Modern England, 1500–1700. Newark, DE:  University of Delaware Press. 
          
          Shevlin, Eleanor F. 1999. “‘To reconcile book and title, and make ’em kin to one another’: The evolution of the title’s  contractual functions”. Book History 2(1): 42–77. 
          
          Suhr, Carla. 2011. Publishing  for the Masses: Early Modern English Witchcraft Pamphlets (Mémoires de la  Société Néophilologique de Helsinki LXXXIII). Helsinki: Société  Néophilologique. 
          
          Tyrkkö, Jukka. Forthcoming. “Printing houses as communities of practice: Orthography in early modern  medical books”. Communities of Practice in the History of English, ed  by Joanna Kopazcyk & Andreas Jucker. Amsterdam: Benjamins.  
          
          
           | 
            |