Skip to content

Detail of publication

Citation

Skorkovská, L. : JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering . SVK 2012 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů, p. 91-93, Západočeská univerzita v Plzni, Plzeň, 2012.

Abstract

The topic identification module, which is a part of a complex system for acquisition and storing large volumes of text data, processes each acquired data item and assigns to it topics from a defined topic hierarchy. The topic hierarchy is quite extensive - it contains about 450 topics and topic categories. Since the system is used for processing large amounts of data, a summarization method was implemented and the effect of using only the summary of an article on the topic identification accuracy is studied. The main purpose of the topic identification module is to filter the huge amount of data according to their topics for the future use as the language modeling training data. The module uses a language modeling based approach similar to the Naive Bayes classifier for the implementation of the topic identification and assigns 3 topics to each article. Topics are chosen from a hierarchical system - a "topic tree".

Detail of publication

Title: JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering
Author: Skorkovská, L.
Language: English
Date of publication: 31 May 2012
Year: 2012
Type of publication: Papers in proceedings of reviewed conferences
Title of journal or book: SVK 2012 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů
Page: 91 - 93
ISBN: 978-80-261-0127-7
Publisher: Západočeská univerzita v Plzni
Address: Plzeň
Date: 31 May 2012 - 31 May 2012
/ 2013-09-10 15:16:51 /

Keywords

topic identification, summarization

BibTeX

@INPROCEEDINGS{SkorkovskaL_2012_JMZWApplicationof,
 author = {Skorkovsk\'{a}, L.},
 title = {JMZW: Application of Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering},
 year = {2012},
 publisher = {Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni},
 journal = {SVK 2012 - magistersk\'{e} a doktorsk\'{e} studijn\'{i} programy, sborn\'{i}k roz\v{s}\'{i}\v{r}en\'{y}ch abstrakt\r{u}},
 address = {Plze\v{n}},
 pages = {91-93},
 ISBN = {978-80-261-0127-7},
 url = {http://www.kky.zcu.cz/en/publications/SkorkovskaL_2012_JMZWApplicationof},
}