Buckeye Corpus

The Buckeye Corpus of conversational speech contains high-quality recordings from 40 speakers in Columbus OH conversing freely with an interviewer. The speech has been orthographically transcribed and phonetically labeled. The audio and text files, together with time-aligned phonetic labels, are stored in a format for use with speech analysis software (Xwaves and Wavesurfer). Software for searching the transcription files is currently being written. The corpus is FREE for noncommercial uses.

Project leader: Prof. Mark Pitt, The Ohio State University
Time of compilation: 1998–2001
Size: 300,000 words
Language: English
Number of texts/samples: 40 speakers
Period: 1998–2000
Released: 2001
Funding: National Institute on Deafness and other Communication Disorders; Office of Research, The Ohio State University
Project home page: https://buckeyecorpus.osu.edu

Reference line and copyright

Pitt, M.A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E. and Fosler-Lussier, E. (2007) Buckeye Corpus of Conversational Speech (2nd release) [www.buckeyecorpus.osu.edu] Columbus, OH: Department of Psychology, Ohio State University (Distributor).

© The Ohio State University Research Foundation




Open access. Freely available for download at https://buckeyecorpus.osu.edu

Technical information

Audio – wav files
Time aligned word and phone tags – ascii text