Background & history

In 1997, the English Language Institute (ELI) at the University of Michigan started the MICASE project. Dr. Rita Simpson was the original project manager, working with Professor John Swales (faculty advisor) and Dr. Sarah Briggs (testing advisor).The project was driven by two questions:

What are the characteristics of contemporary academic speech—its grammar, its vocabulary, its functions and purposes, its fluencies and dysfluencies?
Are these characteristics different for different academic disciplines and for different classes of speakers?

Because MICASE aimed to record a wide range of academic speech, our sampling goals spanned fifteen different types of speech events and four major academic divisions within those types (Humanities and Arts, Social Sciences, Biological and Health Sciences, and Physical Sciences). We adopted stratified random sampling. Each recording is classified according to speech event type, a pre-assigned number indicating the academic discipline, two letters representing the majority of participants in the event (e.g. junior undergraduate, senior faculty, staff), and a final three digit sequence to track chronologically when the tape was recorded. For example, transcript number LEL115SU015 is a recording of a large lecture (LEL) in anthropology (115), at the senior undergraduate level (SU), and is the 15th speech event recorded for MICASE.

All recordings were made with a digital audio tape recorder with two external stereo microphones, and at selected events, a video recorder. Two researchers attended most speech events in order to identify speakers and facilitate transcription by taking field notes about nonverbal contextual information; however, in small groups (e.g. advising sessions, office hours, study groups) where an observer’s presence would have been intrusive, the research assistants left the room after the equipment was set up. All speech was recorded with written consent from the major speakers and verbal consent from other participants. Demographic information (sex, age group, university position, and native language) was collected from each speaker on a form distributed at the end of each event. The speaker information is included in the header of each transcript and is also entered into a separate database. All DAT recordings were captured and stored as MP3 format sound files for use with our computer transcription program, SoundScriber, and have also been re-digitized as WAV format files and transferred to data CD for archival purposes.

In June 2001, the first phase of the project was completed, with over 190 hours of academic speech recorded. In April 2002, the transcription and proofing of all transcripts was completed (approximately 1.8 million words).

Then, in May 2002, the original search interface was launched, with a redesigned version released in June 2007. It has grown in popularity each year since its release, approaching nearly 140,000 hits in 2006. In 2009, we are excited for the release of a number of new features and support tools, including new MICASE online demos and new resources for EAP/ESL teachers!

The project is currently managed by Dr. Ute Römer (Michigan Corpus Linguistics, Unit Director), with support from Dr. Matthew Brook O’Donnell (Post-doctoral Research Fellow). However, the MICASE project has only been possible with the help of a long list of talented faculty, staff, and research assistants over the years.

Michigan Corpus of Academic Spoken English

Background & history