Skip to content

Detail of publication

Citation

Skorkovská, L. and Ircing, P. and Pražák, A. and Jan Lehečka : Automatic Topic Identification for Large Scale Language Modeling Data Filtering . Text, Speech and Dialogue, Lecture Notes in Computer Science, vol. 6836, p. 64-71, Springer, Heidelberg, 2011.

Download PDF

PDF

Additional information


Springerlink

Abstract

The paper presents a module for topic identification that is embedded into a complex system for acquisition and storing large volumes of text data from the Web. The module processes each of the acquired data items and assigns keywords to them from a defined topic hierarchy that was developed for this purposes and is also described in the paper. The quality of the topic identification is evaluated in two ways - using classic precision-recall measures and also indirectly, by measuring the ASR performance of the topic-specific language models that are built using the automatically filtered data.

Detail of publication

Title: Automatic Topic Identification for Large Scale Language Modeling Data Filtering
Author: Skorkovská, L. ; Ircing, P. ; Pražák, A. ; Jan Lehečka
Language: English
Date of publication: 1 Sep 2011
Year: 2011
Type of publication: Papers in journals
Title of journal or book: Text, Speech and Dialogue
Series: Lecture Notes in Computer Science
Číslo vydání: 6836
Page: 64 - 71
ISBN: 978-3-642-23537-5
ISSN: 0302-9743
Publisher: Springer
Address: Heidelberg
Date: 1 Sep 2011 - 5 Sep 2011
/ 2012-10-01 13:13:54 /

Keywords

topic identification, language modeling, automatic speech recognition

BibTeX

@ARTICLE{SkorkovskaL_2011_AutomaticTopic,
 author = {Skorkovsk\'{a}, L. and Ircing, P. and Pra\v{z}\'{a}k, A. and Jan Lehe\v{c}ka},
 title = {Automatic Topic Identification for Large Scale Language Modeling Data Filtering},
 year = {2011},
 publisher = {Springer},
 journal = {Text, Speech and Dialogue},
 address = {Heidelberg},
 volume = {6836},
 pages = {64-71},
 series = {Lecture Notes in Computer Science},
 ISBN = {978-3-642-23537-5},
 ISSN = {0302-9743},
 url = {http://www.kky.zcu.cz/en/publications/SkorkovskaL_2011_AutomaticTopic},
}