Basic structure
MICUSP consists of around 830 papers (roughly 2.6 million words) of different types (e.g. essays, reports, response papers) from altogether 16 disciplines (see below) within four academic divisions (Humanities and Arts, Social Sciences, Biological and Health Sciences, and Physical Sciences). All papers included in MICUSP were written by final year undergraduate and graduate students who obtained an A grade for their paper.
Table 1. Disciplines represented in MICUSP.
Discipline |
Word count |
Biology |
174,112 |
Civil & Environmental Engineering |
98,213 |
Economics |
77,668 |
Education |
149,099 |
English |
265,396 |
History & Classical Studies |
180,523 |
Industrial & Operations Engineering |
124,527 |
Linguistics |
154,160 |
Mechanical Engineering |
122,176 |
Natural Resources & Environment |
173,071 |
Nursing |
157,785 |
Philosophy |
127,290 |
Physics |
44,866 |
Political Science |
208,223 |
Psychology |
322,808 |
Sociology |
214,087 |
Figure 1. Disciplines represented in MICUSP.
Each of the papers in MICUSP has been marked up in XML and maintains the structural divisions (sections, headings, paragraphs) of the original paper. A file header that has been added to each MICUSP file includes, among other things, information about the discipline and the student’s level, native-speaker status, and gender, which makes it possible to carry out customized searches in subsections of the corpus, e.g. only in Biology papers written by native-speaker final year undergraduate students.
Paper classification
The goal of the MICUSP paper classification was to enable corpus users to browse for and search in papers of a particular type, e.g. research papers or reports.
The paper classification system was developed through a series of interlocking steps by a group of scholars and graduate students in the fields of corpus linguistics, genre analysis, EAP pedagogy, and language testing. The development of a classification system was essentially data-driven:
- random sets of papers pulled from MICUSP were classified independently by a group of linguists, EAP teachers & graduate students in English and Education;
- an initial set of categories and definitions was redefined several times based on further MICUSP evidence.
The result of this procedure was a list of seven paper categories (in alphabetical order) (see Definitions for MICUSP paper classification):
- argumentative essay
- creative writing
- critique/evaluation
- proposal
- report
- research paper
- response paper
Figure 2. Distribution of texts across paper types.
|