Open Speech and Language Resources


Identifier: SLR7

Summary: English speech recognition training corpus from TED talks, created by Laboratoire d’Informatique de l’Université du Maine (LIUM) (mirrored here)

Category: Speech

License: Creative Commons BY-NC-ND 3.0 (attribution/non-commercial/no-derivatives).

Downloads (use a mirror closer to you):
TEDLIUM_release1.tar.gz [21G]   (The first release )   Mirrors: [US]   [EU]   [CN]  

About this resource:

The TED-LIUM corpus (mirrored here) is English-language TED talks, with transcriptions, sampled at 16kHz. It contains about 118 hours of speech.

The original page requests that you cite the following paper if you make use of this corpus:

A. Rousseau, P. Deléglise, and Y. Estève, "TED-LIUM: an automatic speech recognition dedicated corpus",
in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), May 2012.

External URL:   Original source