Skip to content

Detail of publication

Citation

Kanis, J. and Müller, L. : Automatic lemmatizer construction with focus on OOV words lemmatization . Text, speech and dialogue, Lecture notes in artificial intelligence, no. 3658, p. 132-139, Springer, Berlin, 2005.

Abstract

This paper deals with the automatic construction of a lemmatizer from a Full Form - Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three kinds of OOV words (missing full forms, unknown words, and compound words) are introduced. These methods were tested on Czech test data. The best result (recall: 99.3 % and precision: 75.1 %) has been achieved by a combination of these methods. The lexicon-free lemmatizer based on the method of lemmatization of unknown words (lemmatization patterns method) is introduced too.

Detail of publication

Title: Automatic lemmatizer construction with focus on OOV words lemmatization
Author: Kanis, J. ; Müller, L.
Language: English
Date of publication: 12 Sep 2005
Year: 2005
Type of publication: Papers in proceedings of reviewed conferences
Title of journal or book: Text, speech and dialogue
Edition: Lecture notes in artificial intelligence, no. 3658
Page: 132 - 139
ISBN: 3-540-28789-2
Publisher: Springer
Address: Berlin
Date: 12 Sep 2005 - 16 Sep 2005
/ 2008-04-18 14:19:39 /

Keywords

lemmatization, OOV words

BibTeX

@INPROCEEDINGS{KanisJ_2005_Automaticlemmatizer_1,
 author = {Kanis, J. and M\"{u}ller, L.},
 title = {Automatic lemmatizer construction with focus on OOV words lemmatization},
 year = {2005},
 publisher = {Springer},
 journal = {Text, speech and dialogue},
 address = {Berlin},
 pages = {132-139},
 ISBN = {3-540-28789-2},
 url = {http://www.kky.zcu.cz/en/publications/KanisJ_2005_Automaticlemmatizer_1},
}