Open Speech and Language Resources

Free ST Chinese Mandarin Corpus

Identifier: SLR38

Summary: A free Chinese Mandarin corpus by Surfingtech (, containing utterances from 855 speakers, 102600 utterances;

Category: Speech

License: Creative Common BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International)

Downloads (use a mirror closer to you):
ST-CMDS-20170001_1-OS.tar.gz [8.2G]   ( speech audios and transcripts )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This corpus were recorded in silence in-door environment using cellphone. It has 855 speakers. Each speaker has 120 utterances. All utterances were carefully transcribed and checked by human. Transcription accuracy is guaranteed. If there is any problem, we agree to correct them for you. The corpus contains:
	audio files;

Please cite the data as “ST-CMDS-20170001_1, Free ST Chinese Mandarin Corpus”.

The data set is a subset of a much bigger data set which was recorded in the same environment as this open source data. Please visit our website for details.

External URL: