Basic structure

MICUSP consists of around 830 papers (roughly 2.6 million words) of different types (e.g. essays, reports, response papers) from altogether 16 disciplines (see below) within four academic divisions (Humanities and Arts, Social Sciences, Biological and Health Sciences, and Physical Sciences). All papers included in MICUSP were written by final year undergraduate and graduate students who obtained an A grade for their paper.

Table 1. Disciplines represented in MICUSP.

Discipline Word count
Biology 174,112
Civil & Environmental Engineering 98,213
Economics 77,668
Education 149,099
English 265,396
History & Classical Studies 180,523
Industrial & Operations Engineering 124,527
Linguistics 154,160
Mechanical Engineering 122,176
Natural Resources & Environment 173,071
Nursing 157,785
Philosophy 127,290
Physics 44,866
Political Science 208,223
Psychology 322,808
Sociology 214,087

Disciplines represented in MICUSP.

Figure 1. Disciplines represented in MICUSP.

Each of the papers in MICUSP has been marked up in XML and maintains the structural divisions (sections, headings, paragraphs) of the original paper. A file header that has been added to each MICUSP file includes, among other things, information about the discipline and the student’s level, native-speaker status, and gender, which makes it possible to carry out customized searches in subsections of the corpus, e.g. only in Biology papers written by native-speaker final year undergraduate students.

Paper classification

The goal of the MICUSP paper classification was to enable corpus users to browse for and search in papers of a particular type, e.g. research papers or reports.

The paper classification system was developed through a series of interlocking steps by a group of scholars and graduate students in the fields of corpus linguistics, genre analysis, EAP pedagogy, and language testing. The development of a classification system was essentially data-driven:

  • random sets of papers pulled from MICUSP were classified independently by a group of linguists, EAP teachers & graduate students in English and Education;
  • an initial set of categories and definitions was redefined several times based on further MICUSP evidence.

The result of this procedure was a list of seven paper categories (in alphabetical order) (see Definitions for MICUSP paper classification):

  • argumentative essay
  • creative writing
  • critique/evaluation
  • proposal
  • report
  • research paper
  • response paper

Distribution of texts across paper types.

Figure 2. Distribution of texts across paper types.