Publications
Detail of publication
Citation
p. 95-96, Západočeská univerzita v Plzni, Plzeň, 2011. : JMZW: Topic Identification in Czech Newspaper Articles . SVK 2011 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů,
Abstract
Topic identification module is a part of the complex system for acquisition and storing large volumes of text data from the Web called JMZW - Jazykové modelování z webu. This module processes each acquired text item, mostly newspaper article, and automatically assigns keywords from a predefined topic hierarchy to it.The main purpose of the JMZW system is to acquire and process data for training of extensive language models used in Automatic Speech Recognition systems. Since it has been shown that a smaller topic specific language model can outperform a much bigger general one, it is important to filter the gathered data according to its topics.
Detail of publication
Title: | JMZW: Topic Identification in Czech Newspaper Articles |
---|---|
Author: | Skorkovska, L. |
Language: | English |
Date of publication: | 26 May 2011 |
Year: | 2011 |
Type of publication: | Papers in proceedings of reviewed conferences |
Title of journal or book: | SVK 2011 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů |
Page: | 95 - 96 |
ISBN: | 978-80-261-0000-3 |
Publisher: | Západočeská univerzita v Plzni |
Address: | Plzeň |
Date: | 26 May 2011 - 26 May 2011 |
Keywords
topic identification, newspaper, language models
BibTeX
@INPROCEEDINGS{SkorkovskaL_2011_JMZWTopic, author = {Skorkovska, L.}, title = {JMZW: Topic Identification in Czech Newspaper Articles}, year = {2011}, publisher = {Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni}, journal = {SVK 2011 - magistersk\'{e} a doktorsk\'{e} studijn\'{i} programy, sborn\'{i}k roz\v{s}\'{i}\v{r}en\'{y}ch abstrakt\r{u}}, address = {Plze\v{n}}, pages = {95-96}, ISBN = {978-80-261-0000-3}, url = {http://www.kky.zcu.cz/en/publications/SkorkovskaL_2011_JMZWTopic}, }