Background

ICE-GB is the British component of ICE. This page gives information on both the ICE project and specifically on ICE-GB.

The International Corpus of English (ICE)

The International Corpus of English (ICE) project was initiated in 1988 by the late Sidney Greenbaum, the then Director of the Survey of English Usage, University College London. In a brief notice in World Englishes, Greenbaum pointed out that grammatical studies had been greatly facilitated by the availability of two computerized corpora of printed English, the Brown Corpus of American English, and the LOB (Lancaster/Oslo-Bergen) Corpus of British English. Greenbaum continued:

We should now be thinking of extending the scope for computerized comparative studies in three ways: (1) to sample standard varieties from other countries where English is the first language, for example Canada and Australia; (2) to sample national varieties from countries where English is an official additional language, for example India and Nigeria; and (3) to include spoken and manuscript English as well as printed English. (Greenbaum 1988)

In response, linguists from around the world came forward to discuss Greenbaum's proposal, and ultimately to put it into effect (Greenbaum 1991). The project soon became known as the International Corpus of English (ICE), and was coordinated by Greenbaum until 1996. From 1996 to 2001, ICE was coordinated by Charles Meyer, University of Massachusetts-Boston. It is now coordinated by Gerald Nelson in Hong Kong. The ICE project involves research teams in each of the countries or regions shown below.

Australia
Cameroon
Canada
East Africa (Kenya, Malawi, Tanzania)
Fiji
Great Britain
Hong Kong
India

Ireland
Jamaica
Kenya
Malta
Malaysia
New Zealand
Nigeria
Pakistan

Philippines
Sierra Leone
Singapore
South Africa
Sri Lanka
Trinidad and Tobago
USA

Each ICE team is compiling – or has already compiled – a one million-word corpus of their own national or regional variety of English. Crucially, each team follows a common corpus design and a common annotation scheme, in order to ensure maximum comparability between the components (Nelson 1996). The long-term aim of ICE is to produce up to twenty one million-word corpora, each syntactically analysed according to a common parsing scheme, and supplied with the retrieval software, ICECUP.

Each ICE corpus samples the English of adults (age 18 or over) who have been educated through the medium of English to at least the end of secondary schooling. Furthermore, each component corpus is grammatically analysed using a common grammatical annotation scheme.

ICE-GB

ICE-GB is the British component of ICE. It was compiled and grammatically analysed at the Survey of English Usage, between 1990 and 1998. Version 1 was released on CD-ROM in 1998, with ICECUP 3.0. Version 2, documented here, was released in 2006 with ICECUP 3.1 and audio recordings.

International Corpus of English – Great Britain

Background

The International Corpus of English (ICE)

ICE-GB