Publications
Detail of publication
Citation
p. 319-325, SPIIRAS, Saint-Petersburg, 2004. : Automatic punctuation annotation in Czech broadcast news speech . SPECOM´2004,
Download PDF
Abstract
This paper reports our initial experiments with automatic punctuation annotation from speech. We have focused on Czech broadcast news speech. We employed two statistical models - prosodic model and language model. The prosodic model expresses relationships between prosodic quantities (such as pitch, speaking rate or loudness) and punctuation marks. We tested two implementations of this model -- decision tree and multi-layer perceptron. Hidden-event N-gram models were employed for language modeling. Instead of using an ordinary word-based model, we replaced infrequent word forms by their morphological tags and trained a mixed model. Scores from both models can be combined. The model combining language model with the decision tree yielded superior results. Testing on true words we achieved classification accuracy 95.2% and F-measure 78.2%.
Detail of publication
Title: | Automatic punctuation annotation in Czech broadcast news speech |
---|---|
Author: | Kolář, J. ; Švec, J. ; Psutka, J. |
Language: | English |
Date of publication: | 20 Sep 2004 |
Year: | 2004 |
Type of publication: | Papers in proceedings of reviewed conferences |
Title of journal or book: | SPECOM´2004 |
Page: | 319 - 325 |
ISBN: | 5-7452-0110-X |
Publisher: | SPIIRAS |
Address: | Saint-Petersburg |
Date: | 20 Sep 2004 - 22 Sep 2004 |
Keywords
automatic punctuation, prosody, hidden-event n-gram model, sentence boundary, broadcast news, tag-based models
BibTeX
@INPROCEEDINGS{KolarJ_2004_Automaticpunctuation, author = {Kol\'{a}\v{r}, J. and \v{S}vec, J. and Psutka, J.}, title = {Automatic punctuation annotation in Czech broadcast news speech}, year = {2004}, publisher = {SPIIRAS}, journal = {SPECOM?2004}, address = {Saint-Petersburg}, pages = {319-325}, ISBN = {5-7452-0110-X}, url = {http://www.kky.zcu.cz/en/publications/KolarJ_2004_Automaticpunctuation}, }