English and Russian event annotation: A pilot study
Anna Feldman, Department of Linguistics, Montclair State University
Katya Arshavskaya, Department of Linguistics, Montclair State University
1. Introduction
This study assesses the cross-linguistic validity of the temporal categorization of verbs and clauses. In this paper we explore Vendler's (1967) classification scheme of verbs (valid for English) and Moens & Steedman's (1988) and Siegel's (1999) classification scheme of clauses as applied to Russian, and compare the results with English. None of these schemes were explicitly claimed to be universal and to represent all possible ways in which verbs and clauses can be classified with respect to time determination. Clearly, a verb exhibiting a use fairly covered by one schema can have divergent uses. Nevertheless, we decided to use these classification schemes for annotating verbs and clauses in Russian and get a sense whether these temporal descriptions are adequate for this language. The long-term goal of our project is to filter out linguistic indicators that can be fed into a machine learning algorithm to classify verbs and clauses by aspect in Russian. Our preliminary investigations show that a Russian verb's or clause's event type can be predicted by co-occurrence frequencies between the verb and various linguistic phenomena (e.g. (im)perfectivity, transitivity, reflexivity).
2. Aspectual classes
2.1 Verbs
One of the most influential temporal categorizations of verbs is Vendler's (1967) classification of time schemata. Vendler (1967) distinguishes four time schemata: states, activities, accomplishments, and achievements. For each category Vendler introduced a specific test, valid for English. Processes or activities (e.g. running) and accomplishments (e.g. drawing a circle) are singled out because the progressive aspect can be used with them. This puts them into opposition with states and achievements. However, states and accomplishments differ in their implications. If someone is running and he stops, we still can say that he ran, and this is true for all the increments of time in question. In contrast, if someone was in the process of drawing a circle and he stops, we cannot say that he drew a circle before the result was actually accomplished (hence the term accomplishment). States and achievements cannot take the continuous aspect and they differ from each other as to whether they can be predicated for only a single moment (achievements, e.g. reaching the top of a hill), or whether their predication is possibly true for a longer increment of time. Achievements answer the question At what time?, whereas states answer the question For how long? (Vendler 1967). Further, when achievements are used in the present tense, they usually express the historical present, as illustrated by Vendler in 'Now he finds the treasure' (1967: 103) which cannot mean that he finds the treasure right at that minute.The appropriateness to answer questions like For how long? distinguishes states from achievements. These tests show that there are valid linguistic reasons to classify English verbs in the categories proposed by Vendler. The same system was used to classify verbs in Russian. We wanted to conduct a contrastive study which will shed light on how English and Russian verbs convey different types of events.
2.2 Clauses
Moens & Steedman (1988) proposed an ontology of events described by the clauses rather than the classification provided by temporal primitives described by individual verbs. The main idea is that natural-language categories, such as aspect, futurates, adverbials, when-clauses, etc., change the temporal/aspectual category of propositions. In our work, we classified clauses into the categories suggested in Moens & Steedman (1988): culmination, state, culminated process, point, and process. English examples corresponding to each event type are provided below.
(1) |
a. Harry reached the top. [culmination] |
|
b. Harry hiccuped. [point] |
|
c. Harry climbed for several hours. [process] |
|
d. Harry climbed to the top. [culminated process] |
|
e. Harry knows English. [state] |
Informally, culmination is an event which the speaker views as punctual or instantaneous, and as accompanied by a transition to a new state of the world. Point describes an event that is viewed as an indivisible whole. Process describes an event as extended in time but not characterized by any particular conclusion or culmination. Culminated process typically describes a state of affairs that extends in time but that does have a particular culmination associated with it at which a change of state takes place. This classification was used for our Russian clauses to verify how well Russian and English correspond in their aspectual classes.
Certain features of a clause, such as tense, the presence of temporal adverbs, various complements and certain prepositional phrases contribute to the aspectual class of the clause (e.g. Vendler (1967), Dowty (1979), Resnik (1996), Klavans & Resnik (1997), among others). Several features were identified as useful for automatic classification of clauses into different event types (see Siegel (1999) for more discussion). For example, if a clause in English occurs with a temporal adverb, such as then, the clause is likely to be classified as an event. If a clause occurs with a duration in-PP (e.g. in an hour), the clause is likely to be a culminated event. Clearly, individual linguistic indicators like the ones listed above do not have strong predictive power. Siegel (1999) shows that employing multiple linguistic indicators in combination rather than using them individually leads to better classification performance.
2.3 Applications of aspectual classification
Aspectual classification of verbs and clauses is necessary for assessing temporal relationships. It is, therefore, an essential component for many natural language applications that require the ability to reason about time. The ability to distinguish states from events, for example, is a fundamental component of applications that perform natural language interpretation, natural language generation, summarization, information retrieval, and machine translation tasks. Knowing to what lexical aspectual class a verb belongs is required for interpreting event sequences in discourse, interfacing to temporal databases, processing temporal modifiers, describing allowable alternations and their semantic effects, for selecting tense and lexical items for natural language generation. Lexical aspect facilitates lexical selection and the interpretation of events in Machine Translation (MT) and foreign language tutoring applications, respectively.
Aspect is important for tense selection in machine translation between certain pairs of languages. Russian, for example (see the discussion below in more detail), has explicit markers to express perfectivity. Thus, an automatic system translating, for instance, from English to Russian must first detect the aspectual category of an input Russian phrase in order to determine the output English form. The Russian phrase Oн сделал уроки would be translated differently from Oн делал уроки: He has done the homework vs. He was doing the homework. (Perfective verbs are presented in green and imperfective verbs in lilac.)
For processing medical texts, for example, it is important to understand the difference between the following statements (Siegel 1999): The small bowel became completely free when dissection was continued vs. The small bowel became completely free when dissection was performed. In the first case, the become culmination takes place at the onset of the continue process. In the second case, the become culmination takes place at the completion of the perform culminated process.
There are many other applications of temporal reasoning, i.e., the ability to reason about time: natural language understanding (e.g. story understanding), planning (e.g. robot planning, therapy planning), causal reasoning (e.g. diagnosis), psychology (e.g. developmental behavioral psychology), etc.
There exist annotated corpora where verbs are marked with their aspect, mostly based on their morphological information (e.g. Multext-East). However, corpora that are annotated with more fine-grained event information are scarce, especially for less commonly taught languages. One of the fundamental questions, both in theoretical linguistics and corpus linguistics, is what kind of aspectual information is crucial for linguistic analysis and for Natural Language Processing (NLP) applications. Another question that naturally arises in this context is whether a proposed event classification can be equally applicable to any language, or the linguistics properties of a language affect the inventory of events in the tagset.
2.4 Aspect in Russian
The discussion of aspect in Russian is based on Stoll (2001). There is no generally accepted definition of Russian aspect. The only point that linguists agree on is that Russian has an imperfective and a perfective aspect. In Slavic aspectology, Russian aspect is usually considered to be a binary category, i.e., every Russian verb form is either perfective or imperfective. This applies to the majority of Russian verbs, but there is a small number of biaspectual verbs. A biaspectual verb can either take a perfective or an imperfective value, depending on the context. Examples of biaspectual verbs are: использовать 'utilize', казнить 'execute', жениться 'get married'. This subgroup gets very little attention in the literature.
There are some interesting facts about Russian verbs that are worth mentioning in this section. First, only imperfective verbs can combine with the auxiliary быть 'be' in the analytic future construction, e.g.
(2) |
Когда-нибудь |
я |
буду |
писать |
как |
Hабоков. |
|
Sometime |
I |
will |
write |
like |
Nabokov |
|
'I will write like Nabokov one day.' |
Second, the synthetic future tense is restricted to perfective verbs, e.g.
(3) |
Я |
позвоню |
тебе. |
|
I |
callFUTURE |
you |
|
'I'll call you.' |
Third, there is the syntactic restriction that only imperfective verbs combine with phase verbs. In Russian, verbs like начинать / начать 'begin', продолжать / продолжить 'continue' or кончать / кончить 'stop/finish' can only take an imperfective infinitive as a complement.
2.5 The morphology of Russian aspect
Opinions differ whether Russian aspect is to be treated as an inflectional or a derivational category. Whereas most linguists more or less confidently prefer to categorize Russian aspect as a derivational category (Karcevski (1927), Ruzýicýka (1952), Dahl (1985), Bermel (1997)) only very few claim aspect to be an inflectional category (e.g. Isacenko (1968)).
Prefixes have a double function in Russian. On the one hand, they play a crucial role in the derivation of new verbs, i.e., the prefixes add a meaning to the simplex verb they are attached to
(4) |
строить 'build' → устроить 'arrange' |
On the other hand, prefixation of a simplex verb results in a perfective verb, i.e. the prefix changes the aspect of the verb. In a sense, then, prefixation is both inflectional and derivational.
However, there is another process of deriving a new aspectual form and this is by imperfectivization of a derived prefixed verb. This process is inflectional:
(5) |
устроить → устраивать 'arrange' |
In Slavic linguistics this process is called secondary imperfectivization.
There are basically three possibilities how aspect can be expressed:
(6) |
1. There is no dedicated marker for aspect. |
|
2. There is a dedicated marker for aspect. |
|
3. There is a combined, hybrid marker, which has an additional function beyond marking aspect. |
Russian has all three possible marking types. The first option, where aspect is not coded by a specific marker, is very widespread. All simplex verbs, e.g. думать 'think' and all suppletive pairs belong to this category, e.g. брать / взять 'take'. The absence of a marker gives, however, no clear clue whether the verb is perfective or imperfective, even though most verbs that belong to this group are imperfective.
Second, aspect can be marked with a specific aspectual marker. There is only one suffix which marks aspect and nothing else, namely the suffix -ыв- and its allomorphs which are used to form secondary imperfectives, e.g. in опис-ыв-ать 'describe'. Further we have stem alternations as in кончать / кончить 'end/finish'.
The third option, where aspect is marked by a "portmanteau" morph, is very widespread. Most of the prefixes fall under this category, e.g. за-плакать 'start to cry/burst out in tears'. The stem плакать itself just means 'cry', whereas the prefix adds the meaning of inception and at the same time changes the aspect from imperfective to perfective.
3. Corpus
For this pilot study we selected the Russian translation of a small excerpt of Lao She's Cat's Country. The data we used in this study is available online. We chose to use this text because in our ongoing research we are conducting a systematic contrastive study of Russian and Chinese aspect (Feldman & Lu 2007), where we compare Chinese and Russian linguistic indicators used in a parallel bilingual (unaligned) Russian-Chinese corpus. The Russian data contains 2,778 words (1,316 types). There are 216 clauses in the data and 355 verbs.
Table 1. The number of verbs per class attested in the Russian corpus.
Verb class |
N |
% |
State |
89 |
24 |
Activity |
125 |
34 |
Achievement |
128 |
35 |
Accomplishment |
27 |
7 |
Total |
369 |
100 |
4. Annotation scheme: Verbs
We annotated all verbs in the corpus with the following tags:
- state (S)
- activity (ACT)
- achievement (ACH)
- accomplishment (ACC)
Verbs were considered in isolation. No context was taken into consideration. The annotator was given instructions to tag only the "inherent" aspect of the verb. Based on the linguistic properties of Russian described above and in the literature (e.g. Karcevski (1927), Ruzýicýka (1952), Dahl (1985), among others), we have explored several features and their effect on the verb class classification. These are perfectivity, reflexivity, transitivity, and neuter agreement. Table 2 summarizes the results. Perfective verbs seem to correspond well with the accomplishment class (e.g. похоронить 'bury', посeять 'sow'). At the same time, imperfective verbs can correspond to either the state (e.g. помнить 'remember', чувствовать 'feel') or the activity class (e.g. кричать 'scream', пинать 'kick'). Having a single imperfective feature is not enough to differentiate these two classes. The results obtained in this study show that imperfective transitive verbs (e.g. чувствовать 'feel', слышать 'hear') will have a higher probability to describe states rather than activities. The reflexivity feature appears more frequently with activity verbs than with states. We did not draw any conclusions whether the neutrality feature plays a role in aspectual classification of verbs. We admit that our corpus is small and that these findings might be not exact, but this is the first pilot study that attempts to identify linguistic indicators for future automatic classification of verbs into these types. We plan to explore more linguistic features in our future work and use them for automatic classification of verbs. This will give us an opportunity to evaluate the reliability of the linguistic indicators.
Table 2. Features associated with different verb classes in Russian.
|
Features |
Verb class |
+Perfective |
-Perfective |
Reflexive |
Neuter |
+Transitive |
Total |
Accomplishment |
4 |
3 |
- |
- |
7 |
14 |
Achievement |
128 |
- |
29 |
2 |
20 |
179 |
State |
- |
89 |
12 |
8 |
31 |
140 |
Activity |
44 |
81 |
26 |
3 |
14 |
168 |
Total |
176 |
173 |
67 |
13 |
72 |
501 |
Table 3. The number of clauses per event type attested in the Russian corpus.
Event type |
N |
% |
Culmination |
58 |
28 |
Point |
3 |
1 |
State |
98 |
48 |
Culminated process |
12 |
6 |
Process |
35 |
17 |
Total |
206 |
100 |
Our next step was to look at the clause classification types. For the automatic clause classification,
the most important task is to select the right linguistic indicators, so we decided to use our
development corpus to search for features that seem to affect the type of a clause.
5. Annotation scheme: Clauses
For Russian, the annotation scheme that we followed in this work contains four tags. This is similar to Moens & Steedman's (1988) and Siegel's (1999) classification. The annotator was given the freedom to use all or a subset of the labels, i.e. culmination, culminated process, point, process, and state. There were only 3 verbs annotated as point. The annotator found it difficult to distinguish point from other types of clauses (culmination, in particular). In addition, the Russian corpus contained few clauses that were classified as culminated process.
The following sentences taken from the corpus illustrate each clause type.
(7) a. |
Я |
достигachievement |
цели. |
|
I |
reachPAST |
goalGEN |
| 'I reached the goal.' [culmination] |
b. |
Ee |
жeсткиe |
крылья |
задрoжaлиachievement. |
|
Her |
stiff |
wings |
tremblePAST |
|
'Her stiff wings began to tremble.' [point] |
c. |
Это |
знаютstate |
волшебники. |
|
This |
knowPL |
magicians |
|
'Magicians know this.' [state] |
d. |
Лапы |
мучителeй |
сжaлиcьachievement |
ещё |
крепчe. |
|
PawsNOM |
torturersGEN |
clenchPAST |
even |
tighter |
|
'The paws of the torturers clenched even tighter.' [culminated process] |
e. |
Мы |
летелиactivity |
к |
Mарсу. |
|
We |
flyPAST |
to |
Mars |
|
'We were flying to Mars.' [process] |
Table 4. Sentences with 0-1 verb per clause in Russian (S = state, ACC = accomplishment, ACH = achievement, ACT = activity).
|
Verb class |
Event type |
ACC |
ACH |
ACT |
S |
No verb |
Total |
% |
Culmination |
1 |
18 |
2 |
- |
10 |
31 |
24 |
Point |
- |
2 |
1 |
- |
- |
3 |
2 |
Process |
- |
1 |
16 |
4 |
4 |
25 |
20 |
Culminated process |
2 |
5 |
3 |
- |
- |
10 |
8 |
State |
1 |
4 |
14 |
33 |
6 |
58 |
46 |
Total |
4 |
30 |
36 |
37 |
20 |
127 |
100 |
Table 5. Sentences with 2 verbs per clause in Russian (S = state, ACC = accomplishment, ACH = achievement, ACT = activity). [1]
|
Verb class |
Event type |
S |
S |
S |
S |
ACT |
ACT |
ACH |
ACT |
ACH |
Total |
|
S |
ACT |
ACC |
ACH |
ACT |
ACH |
ACH |
ACC |
ACC |
|
Culmination |
- |
- |
- |
5 |
1 |
1 |
1 |
1 |
1 |
10 |
Process |
- |
3 |
- |
- |
3 |
- |
1 |
- |
- |
7 |
Culminated process |
- |
- |
- |
- |
- |
1 |
- |
- |
- |
1 |
State |
4 |
8 |
1 |
6 |
2 |
6 |
2 |
- |
- |
29 |
Total |
4 |
11 |
1 |
11 |
6 |
8 |
4 |
1 |
1 |
47 |
Researchers (e.g. Vendler (1967), Dowty (1979), Siegel (1999), among others) have noticed that certain features of a clause, such as the presence of adjuncts and tense, are constrained by and contribute to the aspectual class of the clause. Our first investigation, however, deals with the type of the verb and the type of clause interdependence. The tables above summarize the data. We have paid attention to the number and the type of verbs that appeared in the clause. For clauses with one verb, the activity verbs appear more often in process clauses; achievement verbs appear in culmination clauses and state verbs in state clauses. Note that Russian allows verbless clauses. These seem to be frequently classified as culmination or state.
The analysis of clauses with multiple verbs is more complicated and less reliable, since there were fewer such sentences. The conclusion that we can draw based on our data is that 1) if one of the verbs in a clause is a state verb, the probability that the entire clause denotes a state is high; 2) if one of the verbs in a clause is an achievement verb, the probability that the entire clause denotes a culminated event (culmination) is high, too.
6. Discussion
All of the work reported here is ongoing. Annotation continues, and interannotator agreement studies are planned. As the experiments proceed, we hope to identify precise linguistic features that help to classify verbs and classes by aspect.
Our preliminary investigation shows that the aspectual classification used for English can be applied to Russian as well. Given the small size of the corpus, we do not draw any conclusion about the most underpopulated class of events, the class of points. Our study suggests that a verb's or a clause's aspectual category can be predicted by co-occurrence frequencies of linguistic features. For English, these features were identified and tested elsewhere (e.g. Siegel 1999). Some of the features for English that have been found useful for automatic classification include temporal (e.g. then), continuous (e.g. indefinitely) and manner (e.g. diligently) adverbs, the lack of subject, in- and for-PP's, and tense. This paper has explored a number of linguistic features in Russian that seem to be associated with the aspectual class of the verb and constrain the aspectual class of the clause. These features included (im)perfectivity, transitivity, reflexivity, and neuter agreement. Clearly, individual linguistic indicators are predictively incomplete. They should not be used in isolation, but should be combined systematically by a machine learning algorithm. In our ongoing research we are exploring additional linguistics features, such as gender, the absence or presence of negation, tense, and certain prepositional and adverbial phrases. We are experimenting with various supervised and unsupervised learning methods to combine these multiple linguistic indicators for aspectual classification of verbs and clauses in Russian and English.
Notes
[1] There were 18 sentences with three verbs per clause in the Russian data. Of these, seven represented culmination: four cases where all three verbs were in the achievement class, and the other three cases in achievement-activity-activity, achievement-state-activity, and achievement-state-state combinations. Three instances represented process: one achievement-activity-activity, one achievement-achievement-activity, and one accomplishment-activity-state. One instance represented culminated process (achievement-achievement-activity), and seven cases represented state: two achievement-activity-activity, three achievement-achievement-state, and two instances where all three verbs were activity verbs.
There were also five sentences with four verbs per clause in the Russian data. Of these, three represented culmination: one case with four achievement verbs, one with three achievement verbs and an activity verb, and one case of three activity verbs and one state verb. Two instances represented state: one a combination of two state verbs and two activity verbs, and one instance of three state verbs and one achievement verb.
Finally there were two sentences with five verbs per clause. One instance represented culmination, and showed a combination of four achievement verbs and one activity verb; the other instance represented state and had two state, two achievement and one activity verb.
Sources
Lao She. 1933. Cat Country. Russian translation used in this study available at http://chss.montclair.edu/~feldmana/publications/laoshe-ru.txt.
Multext-East. Multilingual Text Tools and Corpora for Central and Eastern European Languages,
http://nl.ijs.si/ME/
Véronis, Jean. 1999. Multext-Corpora. An Annotated Corpus for Five European Languages. CD-ROM. Distributed by ELRA/ELDA. Project homepage at
http://aune.lpl.univ-aix.fr/projects/multext/.
References
Bermel, Neil. 1997. Context and the Lexicon in the Development of Russian Aspect. Berkeley: University of California Press.
Dahl, Östen. 1985. Tense and Aspect Systems. Oxford: Basil Blackwell.
Dowty, David R. 1979. Word Meaning and Montague Grammar. Dordrecht: Reidel.
Feldman, Anna & Xiaofei Lu. 2007. "Russian and Chinese event annotation: A contrastive study". Proceedings from The Corpus Linguistics Conference Series. http://www.birmingham.ac.uk/research/activity/corpus/publications/conference-archives/2007-birmingham.aspx
Isacenko, Aleksandr V. 1968. Die russische Sprache der Gegenwart. Halle, Saale: M. Niemeyer.
Karcevski, Sergei. 1927. Systéme du verbe russe; essai de linguistique synchronique. Prague: Plamja.
Klavans, Judith L. & Philip Resnik. 1997. The Balancing Act: Combining Symbolic and Statistic Approaches to Language. Cambridge, MA: MIT Press.
Moens, Marc & Mark Steedman. 1988. "Temporal ontology and temporal reference". Computational Linguistics 14(2): 15-21.
Resnik, Philip. 1996. "Selectional constraints: An information-theoretic model and its computational realization". Cognition 61: 127-159.
Ruzýicýka, Rudolph. 1952. "Der Russische Verbalaspekt". Der Russischuntericht 5: 161-169.
Siegel, Eric V. 1999. "Corpus-based linguistic indicators for aspectual classification". Proceedings of the 37th Annual Meeting. Association for Computational Linguistics.
Stoll, Sabine Erika. 2001. The Acquisition of Russian Aspect. Ph. D. thesis. University of California, Berkeley.
Vendler, Zeno. 1967. Linguistics in Philosophy. Cornell University Press: Ithaca, NY.
|