Publications
Detail of publication
Citation
p. 91-93, Západočeská univerzita v Plzni, Plzeň, 2012. : JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering . SVK 2012 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů,
Abstract
The topic identification module, which is a part of a complex system for acquisition and storing large volumes of text data, processes each acquired data item and assigns to it topics from a defined topic hierarchy. The topic hierarchy is quite extensive - it contains about 450 topics and topic categories. Since the system is used for processing large amounts of data, a summarization method was implemented and the effect of using only the summary of an article on the topic identification accuracy is studied. The main purpose of the topic identification module is to filter the huge amount of data according to their topics for the future use as the language modeling training data. The module uses a language modeling based approach similar to the Naive Bayes classifier for the implementation of the topic identification and assigns 3 topics to each article. Topics are chosen from a hierarchical system - a "topic tree".
Detail of publication
Title: | JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering |
---|---|
Author: | Skorkovská, L. |
Language: | English |
Date of publication: | 31 May 2012 |
Year: | 2012 |
Type of publication: | Papers in proceedings of reviewed conferences |
Title of journal or book: | SVK 2012 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů |
Page: | 91 - 93 |
ISBN: | 978-80-261-0127-7 |
Publisher: | Západočeská univerzita v Plzni |
Address: | Plzeň |
Date: | 31 May 2012 - 31 May 2012 |
Keywords
topic identification, summarization
BibTeX
@INPROCEEDINGS{SkorkovskaL_2012_JMZWApplicationof, author = {Skorkovsk\'{a}, L.}, title = {JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering}, year = {2012}, publisher = {Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni}, journal = {SVK 2012 - magistersk\'{e} a doktorsk\'{e} studijn\'{i} programy, sborn\'{i}k roz\v{s}\'{i}\v{r}en\'{y}ch abstrakt\r{u}}, address = {Plze\v{n}}, pages = {91-93}, ISBN = {978-80-261-0127-7}, url = {http://www.kky.zcu.cz/en/publications/SkorkovskaL_2012_JMZWApplicationof}, }