Přejít na obsah

Detail publikace

Citace

Jáchym Kolář and Yang Liu : Automatic Sentence Boundary Detection in Conversational Speech: A Cross-Lingual Evaluation on English and Czech . Proc. ICASSP 2010, Dallas, TX, USA, 2010.

PDF ke stažení

PDF

Abstrakt

Automatic sentence segmentation of speech is important for enriching speech recognition output and aiding downstream language processing. This paper focuses on automatic sentence segmentation of speech in two different languages -- English and Czech. For this task, we compare and combine three statistical models -- HMM, maximum entropy, and a boosting-based model BoosTexter. All these approaches rely on both textual and prosodic information. We evaluate these methods on a corpus of multiparty meetings in English, and on a corpus of broadcast conversations in Czech, using both manual and speech recognition transcripts. The experiments show that superior results are achieved when all the three models are combined via posterior probability interpolation. We observe differences in terms of model performance between English and Czech, as well as the feature usage difference in prosodic models between the two languages. Overall, the analysis is important for porting sentence segmentation approaches from one language to another.

Abstrakt v češtině

Automatic sentence segmentation of speech is important for enriching speech recognition output and aiding downstream language processing. This paper focuses on automatic sentence segmentation of speech in two different languages -- English and Czech. For this task, we compare and combine three statistical models -- HMM, maximum entropy, and a boosting-based model BoosTexter. All these approaches rely on both textual and prosodic information. We evaluate these methods on a corpus of multiparty meetings in English, and on a corpus of broadcast conversations in Czech, using both manual and speech recognition transcripts. The experiments show that superior results are achieved when all the three models are combined via posterior probability interpolation. We observe differences in terms of model performance between English and Czech, as well as the feature usage difference in prosodic models between the two languages. Overall, the analysis is important for porting sentence segmentation approaches from one language to another.

Detail publikace

Název: Automatic Sentence Boundary Detection in Conversational Speech: A Cross-Lingual Evaluation on English and Czech
Autor: Jáchym Kolář ; Yang Liu
Název - česky: Automatic Sentence Boundary Detection in Conversational Speech: A Cross-Lingual Evaluation on English and Czech
Jazyk publikace: anglicky
Datum vydání: 14.3.2010
Rok vydání: 2010
Typ publikace: Stať ve sborníku
Název knihy: Proc. ICASSP 2010
Místo vydání: Dallas, TX, USA
Datum: 14.3.2010 - 19.3.2010
/ 2010-03-02 17:52:21 /

Klíčová slova

spoken language understanding, sentence boundary detection, prosody, machine learning

Klíčová slova v češtině

spoken language understanding, sentence boundary detection, prosody, machine learning

BibTeX

@INPROCEEDINGS{JachymKolar_2010_AutomaticSentence,
 author = {J\'{a}chym Kol\'{a}\v{r} and Yang Liu},
 title = {Automatic Sentence Boundary Detection in Conversational Speech: A Cross-Lingual Evaluation on English and Czech},
 year = {2010},
 address = {Dallas, TX, USA},
 booktitle = {Proc. ICASSP 2010},
 url = {http://www.kky.zcu.cz/en/publications/JachymKolar_2010_AutomaticSentence},
}