Basic structure of CONTRAST-IT

CONTRAST-IT is a medium-size multilingual corpus created on the basis of a comparable collection of articles published electronically in online daily newspapers. The CONTRAST-IT corpus is based on a text collection totalling ca. 1.5 million words. The articles are written in five languages: Italian (from Italy: IT), French (from France: FR), Spanish (from Spain: SP), English (from the UK: E), and German (from Germany: G). Each corpus is comparable in size. The size of each language subcorpus is approximately 300,000 words.

The websites serving as sources for the data collection on which the CONTRAST-IT corpus is based include some of the most visited and popular online national newspapers. The articles belong to different thematic sections (politics, economy, sports, etc.). Click here to see a detailed description of the CONTRAST-IT corpus.

For more details, please see: https://contrast-it.philhist.unibas.ch/en/corpora/contrast-it-corpus/

Genre

Journalistic texts from online daily newspapers.