Skip to content

Detail of publication

Citation

Campr Pavel and Pražák Aleš and Psutka Josef V. and Psutka Josef : Online Speaker Adaptation of an Acoustic Model using Face Recognition . Text, Speech and Dialogue, Proceedings of the 16th International Conference TSD 2013, Lecture Notes in Artificial Intelligence, vol. 8082, p. 378-385, Springer Berlin Heidelberg, 2013.

Download PDF

PDF

Abstract

We have proposed and evaluated a novel approach for online speaker adaptation of an acoustic model based on face recognition. Instead of traditionally used audio-based speaker identification we in vestigated the video modality for the task of speaker detection. A simulated on-line transcription created by a large-vocabulary continuous speech recognition (LVCSR) system for online subtitling is evaluated utilizing speaker independent acoustic models, gender dependent models and models of particular speakers. In the experiment, the speaker dependent acoustic models were trained offline, and are switched online based on the decision of a face recognizer, which reduced Word Error Rate (WER) by 12% relatively compared to speaker independent baseline system.

Detail of publication

Title: Online Speaker Adaptation of an Acoustic Model using Face Recognition
Author: Campr Pavel ; Pražák Aleš ; Psutka Josef V. ; Psutka Josef
Language: English
Year: 2013
Type of publication: Papers in journals
Book title: Text, Speech and Dialogue, Proceedings of the 16th International Conference TSD 2013
Series: Lecture Notes in Artificial Intelligence
Číslo vydání: 8082
Page: 378 - 385
DOI: 10.1007/978-3-642-40585-3_48
ISBN: 978-3-642-40584-6
ISSN: 0302-9743
Publisher: Springer Berlin Heidelberg
/ 2013-09-13 13:07:19 /

Keywords

acoustic model, speaker adaptation, face recognition, multimodal processing, automatic speech recognition

BibTeX

@INCOLLECTION{CamprPavel_2013_OnlineSpeaker,
 author = {Campr Pavel and Pra\v{z}\'{a}k Ale\v{s} and Psutka Josef V. and Psutka Josef},
 title = {Online Speaker Adaptation of an Acoustic Model using Face Recognition},
 year = {2013},
 publisher = {Springer Berlin Heidelberg},
 volume = {8082},
 pages = {378-385},
 booktitle = {Text, Speech and Dialogue, Proceedings of the 16th International Conference TSD 2013},
 series = {Lecture Notes in Artificial Intelligence},
 ISBN = {978-3-642-40584-6},
 ISSN = {0302-9743},
 doi = {10.1007/978-3-642-40585-3_48},
 url = {http://www.kky.zcu.cz/en/publications/CamprPavel_2013_OnlineSpeaker},
}