Small Corpus of Colombian English as a Second Language Essays

The Small Corpus of Colombian English as a Second Language Essays (SCoCESLE) can be classified as a small learner corpus. SCoCESLE is made up of 272 argumentative essays written by Colombian English as a Second Language (ESL) learners. It has a total of 81,994 tokens, 6,057 types, and 5,161 lemmas. Each essay has an average length of about 270 words. Essay topics include gender-related issues, education, information technology, environmental problems, personality traits, poverty, genetic engineering, globalisation, pets ownership, compulsory vaccination, transportation, compulsory military conscription, immigration, job satisfaction, economy, foreign language learning, and employees working conditions. The texts in the corpus were written by male (n=157), female (n=114) and gender fluid (n=1) adult learners (i.e., 18+). The learners’ first language is Colombian Spanish. The corpus is unannotated and is divided into a lower proficiency sub-corpus (n=133) and a higher proficiency one (n=139).

Project leader: Mr. Ender Velasco, University of Portsmouth
Time of compilation: 2022–2023
Size: 81,994 words
Language: Colombian English
Number of texts/samples: 272
Period: 2022–2023
Released: 2023
Project home page: https://doi.org/10.17632/wfcbfy29wm.1

Reference line and copyright

Velasco, E. (2023). Small Corpus of Colombian English as a Second Language Essays (SCoCESLE), Mendeley Data, V1, doi:10.17632/wfcbfy29wm.1

CC BY 4.0 licence description:

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Manual

https://doi.org/10.17632/wfcbfy29wm.1

Availability

Open access. Freely available for download at https://doi.org/10.17632/wfcbfy29wm.1

Technical information

The data presented with the corpus includes:

  1. The corpus manual (.pdf)
  2. The corpus metadata (.xls)
  3. The corpus of 272 unannotated plain texts (.txt)
  4. The sub-corpus of 133 lower proficiency texts (.txt)
  5. The sub-corpus of 139 higher proficiency texts (.txt)

SCoCESLE is an unannotated corpus.