Voice of America (VOA) Broadcast News Czech Transcript Corpus

Psutka, J.; Radová, V.; Müller, L.; Ircing, P.; Matoušek, J.

Publikace

Všechny publikace

Detail publikace

Citace

Psutka, J. and Radová, V. and Müller, L. and Ircing, P. and Matoušek, J. : Voice of America (VOA) Broadcast News Czech Transcript Corpus . Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe, 2001.

Abstrakt

The Linguistic Data Consortium collected in 2000 approximately 30 hours of broadcast audio from the Voice of America news service in Czech. The 62 data files presented in this corpus represent the transcripts of the daily broadcasts of 30-minute news programs. The transcriptions were created by native Czech speakers, Pavel Ircing, Jindrich Matousek, Ludek Muller, and Vlasta Radova, working at the Department of Cybernetics, University of West Bohemia (UWB) in Pilsen under the direction of Josef Psutka. They used transcription software provided by the LDC (the "transcriber" package), developed by Eduoard Geoffrois and Claude Barras at DGA, France, with assistance from Zhibiao Wu at the LDC. The package is currently available from the LDC web site: www.ldc.upenn.edu. The version of transcriber used for this project produced a text file format which is no longer supported by the current version of the software; also, the format does not resemble any previous transcription format published by the LDC.

Abstrakt v češtině

Linguistic Data Consortium shromáždilo v roce 2000 přibližně 30 hodin záznamů vysílaných zpráv Hlasu Ameriky v češtině. 62 souborů dat, které jsou součástí tohoto korpusu, jsou reprezentovány přepisy vysílání 30 minutových zpráv. Transkripty byly zpracovány rodilými Čechy, Pavlem Ircingem, Jindřichem Matouškem, Luďkem Müllerem a Vlastou Radovou, kteří pracovali na katedře kybernetiky Západočeské univerzity v Plzni, pod vedením Josefa Psutky. Při práci byl využíván transkripční software opatřený LDC., který byl vyvinut Eduoardem Geoffroisem a Claudem Barrasem z DGA ve Francii, za asistence Zhibiao Wu z LDC. Korpus je aktuálně dostupný na LDC web: www.ldc.upenn.edu.

Detail publikace

Název:	Voice of America (VOA) Broadcast News Czech Transcript Corpus
Autor:	Psutka, J. ; Radová, V. ; Müller, L. ; Ircing, P. ; Matoušek, J.
Název - česky:	Anotovaný korpus rozhlasových zpráv Hlasu Ameriky
Jazyk publikace:	anglicky
Datum vydání:	1.1.2001
Rok vydání:	2001
Typ publikace:	Prototyp, uplatněná metodika, autorizovaný software
Nakladatel:	Katedra kybernetiky, Fakulta aplikovaných věd, Západočeská univerzita v Plzni (práva k šíření předána Linguistic Data Consortium, University of Pe

/ /

Klíčová slova

speech corpus, large vocabulary continuous speech recognition, acoustic modeling

BibTeX

@MISC{PsutkaJ_2001_VoiceofAmerica,
 author = {Psutka, J. and Radov\'{a}, V. and M\"{u}ller, L. and Ircing, P. and Matou\v{s}ek, J.},
 title = {Voice of America (VOA) Broadcast News Czech Transcript Corpus},
 year = {2001},
 publisher = {Katedra kybernetiky, Fakulta aplikovan\'{y}ch v\v{e}d, Z\'{a}pado\v{c}esk\'{a} univerzita v Plzni (pr\'{a}va k \v{s}\'{i}\v{r}en\'{i} p\v{r}ed\'{a}na Linguistic Data Consortium, University of Pe},
 url = {http://www.kky.zcu.cz/en/publications/PsutkaJ_2001_VoiceofAmerica},
}

Pozice katedry v rámci univerzity

Oddělení katedry

Publikace

Detail publikace

Citace

Abstrakt

Abstrakt v češtině

Detail publikace

Klíčová slova

BibTeX