Studies in Variation, Contacts and Change in English
Volume 12

Aspects of corpus linguistics: compilation, annotation, analysis

Edited by Signe Oksefjell Ebeling, Jarle Ebeling, Hilde Hasselgård
University of Oslo

Publication date: 2012

Abstracts

Egan, Thomas
Using translation corpora to explore synonymy and polysemy
http://www.helsinki.fi/varieng/journal/volumes/12/egan/

This paper investigates whether the Norwegian translation equivalents of the two English verbs begin and start and of the multi-polysemous preposition at can aid us in ascertaining the extent to which the former pair may be said to be synonymous and in tracing the polysemous semantic network of the preposition. The corpus data consist of tokens of the three forms in the English language original texts of the English-Norwegian Parallel Corpus. The study shows that begin and start are to all intents and purposes synonymous in some, but not all, syntactic frames. It also transpires that, with one significant exception (the Perception sense, instantiated by look at) the various senses of at cluster into two main semantic sub-networks.

Labrador, Belén
“The amazing thing about this love story”: A corpus-based study of thing as a function word in English and lo-nominalizations in Spanish
http://www.helsinki.fi/varieng/journal/volumes/12/labrador/

This article reports on a study that explores the relationship between the generic English noun thing and the Spanish neuter article lo as a nominalizer of adjectives on the basis of a perceived similarity in their functions; the fact that they are similar in meaning but different in form causes problems for Spanish learners of English as a foreign language (EFL) and sometimes leads to a lack of idiomaticity in target texts produced by Spanish translation students. For example, the overuse of the word “cosa” in Spanish as a literal translation of English “thing” is unnatural, unacceptable even, albeit grammatical. Similarly, although de-adjectival nominalization with a definite article is possible in both languages, there are restrictions on its use in English. English articles and adjectives are invariable in form whereas the morphological nature of Spanish articles and adjectives allows for number and gender distinction, which enables neuter lo in abstract lo-nominalizations. Two monolingual corpora – Collins Wordbanks Online, for English, and CREA, for Spanish – and a parallel corpus – P-ACTRES (composed of original English texts and their corresponding translations into Spanish) have been used for the purpose of this study. The results reveal that thing as a function word and lo as a nominalizer are highly productive grammatical resources and although co-occurrences with a wide range of different adjectives have been found, both tend to concentrate on a series of adjectives, with some variation across the two languages.

Maia, Belinda & Diana Santos
“Who’s afraid of … what?” – in English and Portuguese
http://www.helsinki.fi/varieng/journal/volumes/12/maia_santos/

Fear is generally accepted as a primary emotion in studies on the relationship between cognition and emotion. In this paper we shall use corpora to explore ways in which the English and Portuguese languages reflect the use of language to express this emotion and its relation to cognitive processes.

The paper starts by positioning this study of emotion as an area that can be observed from a viewpoint between the extremes of linguistic universals and relativism. This is followed by a short description of the aspects studied in both languages, and our attempt to identify meta-patterns of the fear lexicon and its usage in context.

Building upon Maia (1994), which contrasted the emotion lexicon in context using comparable corpora of fiction in the two languages, we investigate the subject as follows:

  1. A brief reference to the insights gathered in 1994;
  2. An examination of the patterns for English in the British National Corpus (BNC) and other publicly available corpora, using the major items of the fear lexicon and part-of-speech clues;
  3. An examination of the patterns for Portuguese in the Linguateca AC/DC corpora, using the same clues, but with the added benefit of automatic annotation of a much wider lexicon of fear.

The results from the corpora are not easily comparable because the corpora and the tools available to analyse them are different. We accept that this somewhat limits the contrastive analysis of the two languages, but we feel that it is still possible to demonstrate some interesting points and indicate where the differences can point to further research.

Reichardt, Renate
Local grammar and translation equivalents – Preliminary findings for consider and its German translations
http://www.helsinki.fi/varieng/journal/volumes/12/reichardt/

The premise of this paper is that the local grammar of words, specifically their syntactic valency complements, guides the choice of translation equivalents. The hypothesis is that the preferred translation equivalent of a word is the one which is closest to the valency pattern of the original word. Valency theory is concerned with the local grammar of words, i.e. with the property of a word to combine with a certain number of elements in forming larger units (Emons 1974). A corpus linguistic approach was chosen for the exploration of the relationship between the valency patterns of a word and its corresponding meaning in another language. For exemplification the languages English and German were investigated, but the approach is equally suitable to explore the grammar-lexis interplay between a wide range of languages. The case study examines the polysemous verb consider and its German equivalents. Frequency analysis showed a preference of the syntactic valency patterns of consider for certain translation equivalents. Valency patterns are thus a useful indicator of likely translations into another language, a finding which can be useful in the second language classroom, in translation training or in the compilation of dictionaries and grammars. On the other hand, the contrastive analysis has also shown that there is great overall flexibility in the choice of translation equivalents, which illustrates that what is often considered a straightforward rule-based construction process is much more flexible and unpredictable.

Thunes, Martha
An analysis of translational complexity in English-Norwegian parallel texts
http://www.helsinki.fi/varieng/journal/volumes/12/thunes/

This article presents an empirical study where translational complexity is related to a notion of computability. Samples of English-Norwegian parallel texts have been analysed in order to estimate to what extent the given translations could have been produced automatically, assuming a rule-based approach to machine translation. The study compares two text types, fiction and law text, in order to see how these differ with respect to the question of automatisation. The results of the investigation indicate that automatic translation tools may be helpful in the case of the law texts, and the study concurs with the view that the usefulness of such tools is limited with respect to fiction. Although the chosen empirical method was originally designed to be of relevance to rule-based translation, it is of interest also to contrastive language studies, and to translation research. The analysed data capture some aspects of the relationship between the two language systems English and Norwegian, as well as certain features of translated text as compared to original texts.

Kehoe, Andrew & Matt Gee
Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus
http://www.helsinki.fi/varieng/journal/volumes/12/kehoe_gee/

This paper presents work based on the new Birmingham Blog Corpus: a 600 million word collection of blog posts and reader comments, available through the WebCorp Linguist’s Search Engine interface. We begin by describing the steps involved in building the corpus, including a discussion of the sources chosen for blog data, the ‘seeding’ techniques used, and the design decisions taken. We then go on to focus on textual ‘aboutness’ (Phillips 1985). Whereas in previous work we examined social tagging sites as an aboutness indicator (Kehoe & Gee 2011), in this paper we analyse the reader comments found at the bottom of posts in our blog corpus. Our aim is to determine whether free-text comments offer different insights into the reader perspective on aboutness than those offered by social tags, and whether comments present further linguistic challenges. Online comments are often associated with blogs but are found increasingly on web documents of all kinds, and we also examine the growing importance of reader comments on online news articles.

Lehmann, Hans Martin & Gerold Schneider
BNC Dependency Bank 1.0
http://www.helsinki.fi/varieng/journal/volumes/12/lehmann_schneider/

In this paper we present the first release version of our dependency bank for the British National Corpus. We describe the process of annotating the corpus with syntactic information, discuss the resulting dependency annotation and outline a database storage model for the annotation. We then present a web-based interface to the syntactically annotated data and provide an overview of its functionality. The use of fully automatically parsed data without massive manual intervention is far from unproblematic, given the limited accuracy of state of the art parsers. We discuss the problems inherent to automatic annotation and present strategies for coping with them. The purpose of this project is to give general linguists access to the wealth of syntactic and distributional information present in a large corpus like the British National Corpus.

Santos, Diana, Stella E. O. Tagnin & Elisa Duarte Teixeira
CorTrad and Portuguese-English translation studies: investigating colours
http://www.helsinki.fi/varieng/journal/volumes/12/santos_tagnin_teixeira/

CorTrad is a bidirectional and multiversion English-Portuguese parallel corpus. By multiversion we mean that it consists of more than one version of the translated texts. It is also semantically tagged for colours and clothing. This paper will report on some exploratory studies on translations of colours using CorTrad. Colours are of special interest for translation because they refer to visual properties, and have a strong metaphorical import. Moreover, not all colour metaphors transfer across English and Portuguese, which make their study relevant also for translation teaching. Some of the questions that guided our investigation are: a) How are colours used in English and Portuguese?; b) How often do colours have a terminological, metaphorical, or figurative content?; c) How are colours translated?; d) Are there different translation equivalents for the “same” colour?; and e) How important is genre for colour translation?. Our findings so far reveal that: i) technical texts use descriptive colours very sparsely, although colours are pervasive in technical terminology; and ii) colours used in technical terminology vary across languages, increasing the chances of translation pitfalls.

Sotillo, Susana
Illocutionary acts and functional orientation of SMS texting in SMS social networks
http://www.helsinki.fi/varieng/journal/volumes/12/sotillo/

Investigations of short message service (SMS) texting practices have shown that SMS language constitutes a particular variety of naturally occurring language characterized by structural simplifications and recoverable semantic implications. Despite omissions of noun phrases, function words, auxiliaries, and other features of formal writing, recent studies have shown that the implicit communicative intention is successfully recovered and interpreted by the message recipient. Though various studies have described the transactional, orthographic, and linguistic aspects of text messaging (e.g., Fairon, Klein & Paumier 2006; Hård af Segerstad 2005; Tagg 2009), Thurlow & Poff (2011: 12) claim that perhaps the most important feature of texting is its “sociable function” or relational orientation. Building on previous research that uses Speech Act Theory (SAT) for data analysis (e.g., Nastri, Peña, & Hancock 2006), and Thurlow and Brown’s (2003) communicative intent-functional orientation framework, this study investigates illocutionary acts and their intended function in a subsample drawn from a corpus of 5,809 sent and received text messages. Two research questions are addressed: 1. What types of illocutionary acts are found in the texting practices of individuals who form part of six SMS social networks? 2. What is the communicative intent or functional orientation of illocutionary acts evident in these SMS texting exchanges? The results show that assertives and expressives, followed by directives and commissives, account for the majority of illocutionary acts in the SMS texting data analyzed, and that the functional orientation of SMS texting exchanges can be classified along a continuum of personal-relational and transactional-informational.