CML-TTS Dataset
Identifier: SLR146
Summary: CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages
Category: Speech
License: CC-BY 4.0 license
Downloads (use a mirror closer to you):
cml_tts_dataset_dutch_v0.1.tar.bz [86G] ( Dutch speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_french_v0.1.tar.bz [31G] ( French speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_german_v0.1.tar.bz [190G] ( German speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_italian_v0.1.tar.bz [14G] ( Italian speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_polish_v0.1.tar.bz [5.5G] ( Polish speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_portuguese_v0.1.tar.bz [9.7G] ( Portuguese speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_spanish_v0.1.tar.bz [48G] ( Spanish speech and transcripts
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset_segments_v0.1.tar.bz [16M] (Segments informations
) Mirrors:
[US]
[EU]
[CN]
cml_tts_dataset.md5 [560 bytes] (Checksum of the files above
) Mirrors:
[US]
[EU]
[CN]
About this resource:
CML-TTS is a dataset composed of reading audiobooks from the LibriVox2 project, which uses books from Project Gutenberg3, released in the public domain. It consists of recordings in Dutch, German, French, Italian, Polish, Portuguese, and Spanish, with a sampling rate of 24kHz.
After downloading you must check the md5sum of each file:
a167148101dee6b6c0089e7bf9084f31 cml_tts_dataset_dutch_v0.1.tar.bz 0f2212fe03e0cc444225a6eb79fa099c cml_tts_dataset_french_v0.1.tar.bz 332cae87fe03fd43d17d50b2c05bd872 cml_tts_dataset_german_v0.1.tar.bz cccbc1f885a92594c028ee5ddf622acb cml_tts_dataset_italian_v0.1.tar.bz ab6385ed4acc613ee96ba7b75dfd2ba7 cml_tts_dataset_polish_v0.1.tar.bz 743bad054ca861688aa026e505b26aff cml_tts_dataset_portuguese_v0.1.tar.bz bb7128ec9f804b60485492a2433e18c7 cml_tts_dataset_spanish_v0.1.tar.bz f529a908aba26a6d891b4fb17ab3125b cml_tts_dataset_segments_v0.1.tar.bzYou can cite the data using the following BibTeX entry:
@InProceedings{Cmltts2023, title="CML-TTS: A Multilingual Dataset for Speech Synthesis in Low-Resource Languages", author="Oliveira, Frederico S. and Casanova, Edresson and Junior, Arnaldo Candido and Soares, Anderson S. and Galv{\~a}o Filho, Arlindo R.", editor="Ek{\v{s}}tein, Kamil and P{\'a}rtl, Franti{\v{s}}ek and Konop{\'i}k, Miloslav", booktitle="Text, Speech, and Dialogue", year="2023", publisher="Springer Nature Switzerland", address="Cham", pages="188--199", isbn="978-3-031-40498-6" }