CORPUS LINGUISTICS — 2006

 
K. Pala In this paper we pay attention to the valency frames of Czech verbs and two-level notation for the semantic roles associated with the individual verb arguments. This work is presently going on in NLP Laboratory at FI MU and it involves building the VerbaLex database of Czech verbs (Hlavackova, Horak 2005) with their valence frames related to Czech WordNet. First, we shortly deal with the inventories of the semantic roles (deep cases) as they exist in various projects. Then some of their properties are discussed, particularly, their low compatibility with the real lexical data existing in corpora (as an example, some of the semantic roles used in the ValLex (Zabokrtsky 2005) and VerbaLex databases will be compared). The comparison shows that the used inventories of the semantic roles are too general (e.g., for verbs like videt (see), slyset (hear), spat (sleep), smat se (smile, laugh), plakat (cry)). The solution here is two-level inventory of the semantic roles designed for verb frames in VerbaLex. It exploits selected items from the EuroWordNet Top Ontology and from the Set of the Base Concepts introduced in EuroWordNet Project (Vossen, Bloksma et al. 1998) and further exploited also in EU project Balkanet (Pala, Smrz 2004). In this way we introduce what we call Complex Valency Frames (CVFs). If we apply Word Sketches (Killgariff, Rychly, Smrz, Tugwell 2004), i.e. corpus data, to examine some of the frequent (Czech) verbs it turns out that even if we use CVFs we are not able to describe adequately the semantics of some of their arguments using the mentioned semantic role labels. The Word Sketches Engine (WSE) helps to discriminate more reliably meanings of the individual verbs through the contexts related to them since the one of the outputs from WSE are exhaustive tables containing all relevant collocations for a given word (verb) together with its frequency parameters. Moreover, WSE yields also the semantic clusters for the individual words thus allowing take domain information into consideration. CVFs, and the semantic roles in particular can also serve as a base for verifying whether a semantic classes of verbs, as offered e.g. by Levin (Levin 1993), are consistent. Classifying CVFs according to the roles gives us clusters where the verbs with the same roles in their frames should belong to the same semantic class. Some examples supporting this assumption will be offered. We also touch a question whether the complex valency frames can reasonably work also for the verbs in languages other than Czech, e.g for Bulgarian ((Koeva 2004), see also (Tufis et al. 2006)) or temptatively for English as well. It appears that the valency frames developed primarily for Czech display some universal features. It is becoming obvious that for realistic NLP applications the more detailed data resources are indispensable and WSE is one of the new tools enabling to build them exploiting corpus data.

References

  1. Levin B. English Verb Classes and Alternations: A Preliminary Investigation. The University of Chicago Press, 1993.
  2. Vossen P., Bloksma L. et al. The EuroWordNet base concepts and top ontology. Technical Report Deliverable D017, D034, D036, WP5 EuroWordNet, LE2-4003. University of Amsterdam, 1998.
  3. Palmer M., Rosenzweig J., Hoa Trang Dang, Kipper K. Investigating regular sense extensions based on intersective Levin classes // Proceedings of Coling-ACL98. Montreal CA, August 11-17, 1998.
  4. Stranakova-Lopatkova M., Zabokrtsky Z. Valency dictionary of Czech verbs: Complex tectogrammatical annotation // C. Paz Suarez Araujo, M. Gonzalez-Rodriguez (ed.). LREC 2002. Vol. III. ELRA. 2002.
  5. Kilgarriff A., Rychly P., Smrz P., Tugwell D. Word Sketches // Proceedings of the Euralex Conference. 2004.
  6. Hanks P., Pustejovsky J. Sense in Context: Constructing a Dictionary of Selection Contexts. Draft. 2004.
  7. Pala K., Smrz P. Building Czech WordNet // Romanian Journal of Information Science and Technology. Vol. 1-2. P. 89-97. Bucurest, 2004.
  8. Hlavackova D., Horak A. VerbaLex — New Comprehensive Lexicon of Verb Valencies for Czech // Proceedings of Slovko 2005 Conference. Bratislava, Slovakia, 2005.
  9. Zabokrtsky Z. Verb Valency. PhD Dissertation. MFF UK, Prague, 2005.
  10. Koeva S. Bulgarian WordNet. Final Report. Balkanet. CD ROM. 2004.
  11. Tufis D. et al. Romanian WordNet: New Developments and Applications // Proceedings of the 3rd GWC. Korea, Jeju Island, January 2006.