Publikace
Detail publikace
Citace
p. 284-290, Springer, 2011. : Speaker-clustered Acoustic Models Evaluated on GPU for on-line Subtitling of Parliament Meetings . Text, Speech, and Dialogue, Lecture Notes in Computer Science, vol. 6836,
PDF ke stažení
Abstrakt
This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.
Abstrakt v češtině
This paper describes the effort with building speaker-clustered acoustic models as a part of the real-time LVCSR system that is used more than one year by the Czech TV for automatic subtitling of parliament meetings broadcasted on the channel ČT24. Speaker-clustered acoustic models are more acoustically homogeneous and therefore give better recognition performance than single gender-independent model or even gender-dependent models. Frequent changes of speakers and a direct connection of the LVCSR system to the audio channel require an automatic switching/fusion of models as quickly as possible. An important part of the solution is real time likelihood evaluations of all clustered acoustic models, taking advantage of a fast GPU(Graphic Processing Unit). The proposed method achieved a WER reduction to the baseline gender-independent model over 2.34% relatively with more than 2M Gaussian mixtures evaluated in real-time.
Detail publikace
Název: | Speaker-clustered Acoustic Models Evaluated on GPU for on-line Subtitling of Parliament Meetings |
---|---|
Autor: | Josef V. Psutka ; Jan Vaněk ; Josef Psutka |
Název - česky: | Speaker-clustered Acoustic Models Evaluated on GPU for on-line Subtitling of Parliament Meetings |
Jazyk publikace: | anglicky |
Datum vydání: | 6.9.2011 |
Rok vydání: | 2011 |
Typ publikace: | Stať ve sborníku |
Název knihy: | Text, Speech, and Dialogue |
Svazek: | Lecture Notes in Computer Science |
Číslo vydání: | 6836 |
Strana: | 284 - 290 |
DOI: | 10.1007/978-3-642-23538-2_36 |
ISBN: | 978-3-642-23537-5 |
Nakladatel: | Springer |
BibTeX
@INPROCEEDINGS{JosefVPsutka_2011_Speaker-clustered, author = {Josef V. Psutka and Jan Van\v{e}k and Josef Psutka}, title = {Speaker-clustered Acoustic Models Evaluated on GPU for on-line Subtitling of Parliament Meetings }, year = {2011}, publisher = {Springer}, volume = {6836}, pages = {284-290}, booktitle = {Text, Speech, and Dialogue}, series = {Lecture Notes in Computer Science}, ISBN = {978-3-642-23537-5}, doi = {10.1007/978-3-642-23538-2_36}, url = {http://www.kky.zcu.cz/en/publications/JosefVPsutka_2011_Speaker-clustered}, }