CONTRAST-IT is a medium-size multilingual corpus created on the basis of a comparable collection of articles published electronically in online daily newspapers. The CONTRAST-IT corpus is based on a text collection totalling ca. 1.5 million words. The articles are written in five languages: Italian (from Italy: IT), French (from France: FR), Spanish (from Spain: SP), English (from the UK: E), and German (from Germany: G). Each corpus is comparable in size. The size of each language subcorpus is approximately 300,000 words.

Project leader: Anna-Maria De Cesare (University of Basel)
Size: 1.5 million words
Language: Italian, French, Spanish, English, German
Period: 2011–2015
Released: 2018
Funding: Swiss National Science Foundation
Project home page:
Corpus access:

Reference line and copyright

The corpora and documentation are licensed under the *Creative Commons license: Attribution + Noncommercial:

  • Licensees may copy, distribute, display, and perform the work and make derivative works based on it only for noncommercial purposes.
  • Licensees may copy, distribute, display and publish the work and make derivative works based on it only if they give the author or licensor the credits as follows:
    Anna-Maria De Cesare (2011–2018). CONTRAST-IT. University of Basel,

Copyright © by the contributing authors.


Open access. Available online at

Technical information

Part-of-speech tagged.