Transatlantic Component of the CASE Project (TaCoCASE)

TaCoCASE (Transatlantic Component of the Corpus of Academic Spoken English) is a corpus compiled as part of the CASE project that had been released in September 2023. The corpus consists of 15 computer-mediated conversations (CMC) between native and non-native English speaking students from the UK, Germany and the United States. The total length of the conversations in the corpus is 10.5 hours or 140,003 tokens. TaCoCASE can be used in combination with ViMELF, another sub-corpus from the CASE project, as it adds a native speaker component to the data. Due to its multimodal set-up as a spoken CMC corpus, TaCoCASE is a rich data source which facilitates research in many different fields.

Project leader: Caroline Collet (Saarland University), Prof. Dr. Stefan Diemer (Trier University of Applied Sciences)
Time of compilation: 2016–2023
Size: 140,003 words
Language: English
Number of texts/samples: 15
Period: 2016–2023
Released: 2023
Project home page: https://www.umwelt-campus.de/en/campus/organisation/fachbereichuwur/sprache-kommunikation/indi/en/applied-research/case-project/access-tacocase

Reference line and copyright

TaCoCASE. 2023. Transatlantic Component of the CASE project. Birkenfeld: Trier University of Applied Sciences. Version 1.0. Collet, Caroline. [http://umwelt-campus.de/case/tacocase]

Compilers

Caroline Collet

Availability

The corpus is freely available for noncommercial research. It can be accessed and searched through WebCorp LSE hosted by our partner institution Birmingham City University – access it here: https://www.webcorp.org.uk/wcx/lse/

If you want to obtain access to the full transcripts or the video files, see the instructions on the project home page.

Technical information

Two versions of the corpus are available:

  • CASE transcription (as docx and txt): the basic version produced by manual transcription. CASE transcription conventions include spoken language features beyond the words, such as prosodic, paralinguistic and non-verbal features.
  • XML version (xml): a version of the annotated CASE transcription encapsulating the original information in a machine-readable form (Gee 2018)

Associated projects

Corpus of Academic Spoken English (CASE)
Corpus of Video-Mediated English as a Lingua Franca Conversations (ViMELF)