Parameters and coding

The mark-up system employed here is the SGML system based on the TEI (Text Encoding Initiative) guidelines (cf. Burnard & Sperberg-McQueen s.a.). A special form of it, or document type description (dtd), was created by Lou Burnard to suit the particular requirements of the Lampeter Corpus.

Headers

One corpus header

Provides the necessary bibliographical information about the corpus, the encoding principles and tags used (in brief), and the taxonomies used.

  • <filedesc> contains bibliographical information such as title, editor/publisher, source material, place, date, extent, availability etc.
  • <encodingdesc> contains the declaration of general editorial usage and all the tags employed in the corpus.
  • <encodingdesc> includes the tag usage declaration and the classificatory systems used in the corpus, namely (i) the domain classification together with its threefold sub-domain classification, and (ii) the decade structure.
  • <profiledesc> contains a list of all the languages used in the Lampeter Corpus.
  • <revisiondesc> contains a work report.

120 text headers

  • <filedesc> File description, providing bibliographical information about the text
  • <profiledesc> Profile description, supplying background data about the author and text-type in­formation:
    • Author information: sex, age, name, time and place of birth, the author's places of abode, educational history of the author, the professional and occupational career of the author, socio-economic status and miscellaneous biographical details (as far as available)
    • Text: reference to the textcategory classifications, further characterisation of the text, text-type or genre self-description of the text, text structure <keywords scheme="lamStruc">, i.e. how many parts the text consists of.

Layout markup

Tags take the form < > at the beginning of an element, and </ > at the end, e.g. <text>...</text>. Tags can further contain a number of attributes within the pointed brackets, which specify additional characteristics of the element. For example, the rend attribute (e.g. <text rend=“roman”>), giving type font information, is one which can occur with almost any tag.

Special mark-up is used for the following aspects:

  • Text structure: e.g. textual divisions and subparts (preface, body), headlines, paragraphs
  • Titlepage features: e.g. title, byline, imprimatur
  • Quotations: distinction into <q> and <quote>, special mark-up for quoted letters
  • Tables and lists, e.g. row, cell, item, label
  • Notes & marginalia: marked for type, place, function and anchoring
  • Speech, drama and poems: e.g. markup for speakers, stage directions, line breaks, stanzas
  • Typographical and layout features: e.g. italics, boldface, gothic, capitals
  • Editorial interventions and indications: for additions to the text by other people than the compilers
  • entity references (form: &…;): for non-ASCII symbols

A full explanation of the coding principles and procedures (with examples) is found in the manual.

References

Burnard, Lou & Stephen Sperberg-McQueen. s.a. Guidelines for Electronic Text Encoding. Chicago/Oxford: ACH/ACL/ALLC, Text Encoding Initiative (http://etext.virginia.edu/TEI.html)