Basic structure of the Buckeye Corpus

  • 2–5 audio files per speaker
  • 40 speakers
  • 19 hours of phonetically tagged speech
  • hand corrected phonetic tags, using a superset of ARPABET


Conversational speech, collected with deception, talkers were told that the recording session was a part of a focus group on local issues.

Sociolinguistic coverage

Age and gender stratified sample. No controls over many sociolinguistic variables – respondents answered ads in local papers.