CORPUS LINGUISTICS — 2006

Темы Тезисы   Theses Topics Русский/English
   
I. Chiari A consistent amount of errors and repairs occur at the basic level of transcription of spoken language corpora, when the mere sequence of spoken words are heard and transcribed (e.g. Chiari 2006). Some of these errors are corrected in further stages of annotation (especially when phonetic and phonological labelling is required), but some others remain undetected in the revision process. The present work illustrates the main results of an experiment on errors and repairs in spoken Italian language transcription, with significant relevance for the evaluation of validity, reliability and correctness of transcriptions of speech belonging to several different typologies, set for the annotation of spoken corpora. In particular, we dealt with errors and repair strategies that appear on the first drafts of the transcription process, that are not easily detectable with automatic post-editing procedures. The experiment that will be illustrates is focused on the phase of mere orthographic transcription of the first draft (deliberately excluding further linguistic tagging, such as grammatical or paralinguistic annotation which require specific skills to be learned and developed) of spontaneous speech carried by not specifically trained individuals.