Open Speech and Language Resources

Totonac Resources

Identifier: SLR107

Summary: Totonac Speech with Transcription

Category: Speech

License: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

Downloads (use a mirror closer to you):
Deposit-Totonaco-for-ASR-Community.pdf [162K]   (Totonac Corpus Details (Descriptions and Specifics of the recordings) )   Mirrors: [US]   [EU]   [CN]  
Amith-Lopez_Totonac-recordings-northern-Puebla-and-adjacent-Veracruz_Metadata.xml [218K]   (Speaker Details in xml format )   Mirrors: [US]   [EU]   [CN]  
Totonac_Corpus.tgz [4.2G]   (Totonac Corpus (Sound files and Transcription of Totonac Corpus) )   Mirrors: [US]   [EU]   [CN]  

About this resource:

The substantive material of Totonac from the northern sierras of Puebla and adjacent areas of Veracruz were compiled starting in 2016 by Jonathan D. Amith and continue to the present as part of a joint effort by Amith and Osbel López Francisco, a native speaker biologist from Zongozotla. Please refer details in "Deposit-Totonaco-for-ASR-Community.pdf

Production of the corpus was generously supported by the National Science Foundation, Documentation Endangered Languages program, the Endangered Language Documentation Programme (ELDP) at the School of Oriental and African Studies , and the Jacobs Research fund:
Comparative Totonacan Ethnobotany: Documentation of the Nomenclature, Classification, and Use in Three Communities (Tonalixco, Ecatlán, Pisaflores); Jacobs Research Fund (2020)
Community‐based Ethnobotany in the Sierra Nororiental de Puebla: Zongozotla and Tonalixco Totonac; Jacobs Research Fund (2019)
Totonac ethnobotanical knowledge: Documenting traditional ecological knowledge across communities. Endangered Language Documentation Programme, School of Oriental and African Studies, University of London (MDP0352: David Beck, PI; Jonathan D. Amith, co‐PI) (2016-2017)
A Biological Approach to Documenting Traditional Ecological Knowledge in Synchronic and Diachronic Perspectives, National Science Foundation, Documenting Endangered Languages and Anthropology (Award #BCS‐1401178), including two supplement : Award#1646724, Award#2039336 (2014-2021)
A Biological Approach to Documenting Traditional Ecological Knowledge in Synchronic and Diachronic Perspectives, National Endowment for the Humanities (Award #PD‐50031‐14) (2014-2018)

All material is made available under the Creative Common license CC BY-SA (Attribution-ShareAlike).

If the recordings and transcriptions are cited in general, please use (The corresponding author is Jonathan D.Amith ( Amith, Jonathan D., and Osbel López Francisco. n.d. Audio corpus of Totonac recordings from northern Puebla and adjacent areas of Veracruz. OpenSLR##.

If the speech recognition corpus and some baseline results are used, please cite (The corrresponding author is Jiatong Shi (

  author={Dan Berrebbi and Jiatong Shi and Brian Yan and Osbel López-Francisco and Jonathan Amith and Shinji Watanabe},
  title={{Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation}},
  booktitle={Proc. Interspeech 2022},