CORPUS LINGUISTICS — 2006

Темы Тезисы   Theses Topics Русский/English
   
R. Garabic Russian-Slovak parallel corpus project has a goal to create an automatically annotated and aligned corpus consisting mostly of fiction. Slovak texts are morphologically annotated and disambiguated with the help of the system applied in the Slovak National Corpus; Russian texts are annotated with a Dialing translation system. The texts are aligned at the sentence level. The project will find a use for linguistic research, teaching, translating, crosslinguistic studies, it has various applications in natural language processing, primarily machine translation. The final size of the expected corpus depends on the availability of digital versions of texts, but the intended size should reach 1 million words for each language respectively.