Přejít na obsah

Detail publikace

Citace

Lucie Skorkovská : The Use of the Unconstrained Cohort Normalization Technique for Multi-label Classification Score Normalization . SVK 2014 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů, p. 99-100, Západočeská univerzita v Plzni, 2014.

Abstrakt

The goal of the text classification is to categorize a set of documents into predefined set of topic classes or categories. Usually in the field of text classification we are considering only the multiclass classification, where unlike in the binary classification there is more than two possible classes. The simplest task of the text classification is to assign one topic to each document, but in the task of newspaper article topics identification it is especially essential to use the multi-label classification. Its goal is to find a set of labels belonging to each data item. We are using the generative classifier, where the classifier outputs a distribution of probabilities (or likelihood scores), to tackle this task, but the problem with this approach is that the threshold for the positive classification must be set. This threshold can vary for each document depending on the content of the document (words used, length of the document, ...). The described method for finding a threshold defining the boundary between the "correct'' and the "incorrect'' topics of a newspaper article is based on the Unconstrained Cohort Normalization (UCN) technique used in the speaker identification task.

Detail publikace

Název: The Use of the Unconstrained Cohort Normalization Technique for Multi-label Classification Score Normalization
Autor: Lucie Skorkovská
Název - česky: Použití metody UCN pro normalizaci skóre v úloze klasifikace do více tříd
Jazyk publikace: anglicky
Rok vydání: 2014
Typ publikace: Stať ve sborníku
Název časopisu / knihy: SVK 2014 - magisterské a doktorské studijní programy, sborník rozšířených abstraktů
Strana: 99 - 100
ISBN: 978-80-261-0365-3
Nakladatel: Západočeská univerzita v Plzni
Datum: 22.5.2014 - 22.5.2014
/ 2014-11-13 10:54:45 /

Klíčová slova

topic identification, multi-label classification, Naive Bayes, score normalization

BibTeX

@MISC{LucieSkorkovska_2014_TheUseofthe,
 author = {Lucie Skorkovsk\'{a}},
 title = {The Use of the Unconstrained Cohort Normalization Technique for Multi-label Classification Score Normalization},
 year = {2014},
 publisher = {Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni},
 journal = {SVK 2014 - magistersk\'{e} a doktorsk\'{e} studijn\'{i} programy, sborn\'{i}k roz\v{s}\'{i}\v{r}en\'{y}ch abstrakt\r{u}},
 pages = {99-100},
 ISBN = {978-80-261-0365-3},
 url = {http://www.kky.zcu.cz/en/publications/LucieSkorkovska_2014_TheUseofthe},
}