Basic structure of SCoCESLE

SCoCESLE (The Small Corpus of Colombian English as a Second Language Essays) is categorised as a learner corpus. SCoCESLE was compiled between October 2022 and January 2023. SCoCESLE is a corpus of argumentative essays written by Colombian ESL learners. The learners’ L1 is Colombian Spanish.

SCoCESLE has a total of 81,994 tokens, 272 files (i.e., argumentative essays written by Colombian ESL students), 6,057 types, and 5,161 lemmas. The token count was arrived at by using #LancsBox 6.0 (Brezina et al., 2020). Each essay has an average length of about 270 words.

The topics of the essays in SCoCESLE are related to: gender-related issues, education, information technology, environmental problems, personality traits, poverty, genetic engineering, globalisation, pets ownership, compulsory vaccination, transportation, compulsory military conscription, immigration, job satisfaction, economy, foreign language learning, and employees working conditions.

SCoCESLE was written by male (n=157), female (n=114), other (i.e. gender fluid, n=1) adult learners (18+).

SCoCESLE was specifically designed to represent a small sample of the specialized discourse of argumentative essays written by Colombian ESL students that could not be found in bigger corpora.

SCoCESLE is also split into two learner sub-corpora, corresponding to lower and higher proficiency argumentative essays.

  • SCoCESLE Lower Proficiency: 36,384 tokens / 133 texts / Argumentative essays written by A2-B1 Colombian ESL learners (average length per text 270 words). L1-Spanish
  • SCoCESLE Higher Proficiency: 45,610 tokens / 139 texts / Argumentative essays written by B2-C1 Colombian ESL learners (average length per text 270 words). L1-Spanish