Přejít na obsah

Detail publikace

Citace

Jáchym Kolář and Yang Liu : Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information . Proc. of Speech Prosody, Chicago, IL, USA, 2010.

PDF ke stažení

PDF

Abstrakt

This paper deals with automatic sentence boundary detection in spoken Czech using both textual and prosodic information. This task is important to make automatic speech recognition (ASR) output more readable and easier for downstream language processing modules. We compare and combine three statistical models – hidden Markov model, maximum entropy, and adaptive boosting. We evaluate these methods on two Czech corpora, broadcast news and broadcast conversations, using both manual and ASR transcripts. Our results show that superior results are achieved when all the three models are combined via posterior probability interpolation, and that there is substantial difference among the three methods when using different knowledge sources, as well as in different genres. Feature analysis also reveals significant differences in prosodic feature usage patterns between the two genres.

Abstrakt v češtině

This paper deals with automatic sentence boundary detection in spoken Czech using both textual and prosodic information. This task is important to make automatic speech recognition (ASR) output more readable and easier for downstream language processing modules. We compare and combine three statistical models – hidden Markov model, maximum entropy, and adaptive boosting. We evaluate these methods on two Czech corpora, broadcast news and broadcast conversations, using both manual and ASR transcripts. Our results show that superior results are achieved when all the three models are combined via posterior probability interpolation, and that there is substantial difference among the three methods when using different knowledge sources, as well as in different genres. Feature analysis also reveals significant differences in prosodic feature usage patterns between the two genres.

Detail publikace

Název: Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information
Autor: Jáchym Kolář ; Yang Liu
Název - česky: Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information
Jazyk publikace: anglicky
Datum vydání: 11.5.2010
Rok vydání: 2010
Typ publikace: Článek z časopisu
Název knihy: Proc. of Speech Prosody
Místo vydání: Chicago, IL, USA
Datum: 11.5.2010 - 14.5.2010
/ 2011-03-15 18:43:01 /

Klíčová slova

sentence segmentation, prosody, HMM, maximum entropy, boosting

Klíčová slova v češtině

sentence segmentation, prosody, HMM, maximum entropy, boosting

BibTeX

@INPROCEEDINGS{JachymKolar_2010_Comparingand,
 author = {J\'{a}chym Kol\'{a}\v{r} and Yang Liu},
 title = {Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information},
 year = {2010},
 address = {Chicago, IL, USA},
 booktitle = {Proc. of  Speech Prosody},
 url = {http://www.kky.zcu.cz/en/publications/JachymKolar_2010_Comparingand},
}