Skip to content

Detail of publication

Citation

Radová, V. and Psutka, J. and Psutka Josef V. and Müller, L. and Ircing, P. and Matoušek, J. and Byrne, W. : Czech Broadcast News Corpus . Katedra kybernetiky, fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe, 2004.

Abstract

There are 286 transcripts, corresponding to the 286 audio files (approximately 50 hours of broadcast news). The transcripts contain approximately 196K words and 27K unique words. The news does not contain weather forecasts, sports news, or traffic announcements. The transcripts were created by native Czech speakers working at the Department of Cybernetics, University of West Bohemia in Pilsen, under the direction of Vlasta Radova. The transcription was done using software provided by the LDC (Transcriber 1.4.1). Those parts of the audio recordings that do not contain speech or where the signal was disrupted were not transcribed. As a consequence, the corpus contains about 23 hours of transcribed speech. The transcriptions are provided both in the ISO-8859-2 and Windows-1250 character set.

Detail of publication

Title: Czech Broadcast News Corpus
Author: Radová, V. ; Psutka, J. ; Psutka Josef V. ; Müller, L. ; Ircing, P. ; Matoušek, J. ; Byrne, W.
Language: English
Date of publication: 1 Jan 2004
Year: 2004
Type of publication: Prototype, software
Publisher: Katedra kybernetiky, fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe
/ 2011-06-09 12:41:35 /

Keywords

speech corpus, large vocabulary continuous speech recognition, acoustic modeling

BibTeX

@MISC{RadovaV_2004_CzechBroadcastNews_2,
 author = {Radov\'{a}, V. and Psutka, J. and Psutka Josef V. and M\"{u}ller, L. and Ircing, P. and Matou\v{s}ek, J. and Byrne, W.},
 title = {Czech Broadcast News Corpus},
 year = {2004},
 publisher = {Katedra kybernetiky, fakulta aplikovan\'{y}ch v\v{e}d, Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni (pr\'{a}va k \v{s}\'{i}\v{r}en\'{i} p\v{r}ed\'{a}na Linguistic Data Consortium, University of Pe},
 url = {http://www.kky.zcu.cz/en/publications/RadovaV_2004_CzechBroadcastNews_2},
}