Skip to content

Detail of publication

Citation

Psutka, J. and Radová, V. and Müller, L. and Ircing, P. and Matoušek, J. : Voice of America (VOA) Broadcast News Czech Transcript Corpus . Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe, 2001.

Abstract

The Linguistic Data Consortium collected in 2000 approximately 30 hours of broadcast audio from the Voice of America news service in Czech. The 62 data files presented in this corpus represent the transcripts of the daily broadcasts of 30-minute news programs. The transcriptions were created by native Czech speakers, Pavel Ircing, Jindrich Matousek, Ludek Muller, and Vlasta Radova, working at the Department of Cybernetics, University of West Bohemia (UWB) in Pilsen under the direction of Josef Psutka. They used transcription software provided by the LDC (the "transcriber" package), developed by Eduoard Geoffrois and Claude Barras at DGA, France, with assistance from Zhibiao Wu at the LDC. The package is currently available from the LDC web site: www.ldc.upenn.edu. The version of transcriber used for this project produced a text file format which is no longer supported by the current version of the software; also, the format does not resemble any previous transcription format published by the LDC.

Detail of publication

Title: Voice of America (VOA) Broadcast News Czech Transcript Corpus
Author: Psutka, J. ; Radová, V. ; Müller, L. ; Ircing, P. ; Matoušek, J.
Language: English
Date of publication: 1 Jan 2001
Year: 2001
Type of publication: Prototype, software
Publisher: Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe
/ /

Keywords

speech corpus, large vocabulary continuous speech recognition, acoustic modeling

BibTeX

@MISC{PsutkaJ_2001_VoiceofAmerica,
 author = {Psutka, J. and Radov\'{a}, V. and M\"{u}ller, L. and Ircing, P. and Matou\v{s}ek, J.},
 title = {Voice of America (VOA) Broadcast News Czech Transcript Corpus},
 year = {2001},
 publisher = {Katedra kybernetiky, Fakulta aplikovan\'{y}ch v\v{e}d, Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni (pr\'{a}va k \v{s}\'{i}\v{r}en\'{i} p\v{r}ed\'{a}na Linguistic Data Consortium, University of Pe},
 url = {http://www.kky.zcu.cz/en/publications/PsutkaJ_2001_VoiceofAmerica},
}