Publications
Detail of publication
Citation
Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe, 2001. : Voice of America (VOA) Broadcast News Czech Transcript Corpus .
Abstract
The Linguistic Data Consortium collected in 2000 approximately 30 hours of broadcast audio from the Voice of America news service in Czech. The 62 data files presented in this corpus represent the transcripts of the daily broadcasts of 30-minute news programs. The transcriptions were created by native Czech speakers, Pavel Ircing, Jindrich Matousek, Ludek Muller, and Vlasta Radova, working at the Department of Cybernetics, University of West Bohemia (UWB) in Pilsen under the direction of Josef Psutka. They used transcription software provided by the LDC (the "transcriber" package), developed by Eduoard Geoffrois and Claude Barras at DGA, France, with assistance from Zhibiao Wu at the LDC. The package is currently available from the LDC web site: www.ldc.upenn.edu. The version of transcriber used for this project produced a text file format which is no longer supported by the current version of the software; also, the format does not resemble any previous transcription format published by the LDC.
Detail of publication
Title: | Voice of America (VOA) Broadcast News Czech Transcript Corpus |
---|---|
Author: | Psutka, J. ; Radová, V. ; Müller, L. ; Ircing, P. ; Matoušek, J. |
Language: | English |
Date of publication: | 1 Jan 2001 |
Year: | 2001 |
Type of publication: | Prototype, software |
Publisher: | Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe |
Keywords
speech corpus, large vocabulary continuous speech recognition, acoustic modeling
BibTeX
@MISC{PsutkaJ_2001_VoiceofAmerica, author = {Psutka, J. and Radov\'{a}, V. and M\"{u}ller, L. and Ircing, P. and Matou\v{s}ek, J.}, title = {Voice of America (VOA) Broadcast News Czech Transcript Corpus}, year = {2001}, publisher = {Katedra kybernetiky, Fakulta aplikovan\'{y}ch v\v{e}d, Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni (pr\'{a}va k \v{s}\'{i}\v{r}en\'{i} p\v{r}ed\'{a}na Linguistic Data Consortium, University of Pe}, url = {http://www.kky.zcu.cz/en/publications/PsutkaJ_2001_VoiceofAmerica}, }