Annotation

Digital format

The texts in CoWITE are encoded as UTF-8 plain text files and are accompanied by header flat metadata, which includes detailed information on authorship, date, genre, topic, and source. All texts are part-of-speech annotated using a customised tagset adapted to the specific features of historical English and the instructive genres represented in the corpus.

Each text is available in a diplomatic version, preserving the original spelling and punctuation. Normalised versions, which regularise spelling for easier processing and comparison, are currently in progress, but not yet available for the entire dataset. This format allows for both quantitative and qualitative analysis, supporting a wide range of research applications in historical linguistics, corpus stylistics, discourse analysis, and the history of women’s writing.

Metadata

Each file in the corpus is accompanied by rich metadata annotation, designed to facilitate detailed linguistic, bibliographic, and discursive analysis. The annotation follows a consistent scheme across all texts and is encoded in the header.

The following elements are systematically recorded:

  • Author name (including initials, pseudonyms, or attributions such as A Lady)
  • Gender (all authors in the corpus are women)
  • Full title and a standardised short title
  • Date of composition or publication, along with the corresponding century
  • Genre and subgenre (e.g., recipe, cookery, medical, domestic economy)
  • Variant of English (British or American)
  • Name of the transcriber
  • Place of publication and publishing house (when known)
  • Thematic tags reflecting the content focus (e.g., culinary, medical, health, nutrition)

These metadata annotations allow for filtered searches and corpus-based comparisons based on time period, genre, language variety, and topic. They also ensure transparency and traceability of the source materials, supporting both linguistic and interdisciplinary research.