Open Speech and Language Resources


Identifier: SLR59

Summary: Catalan speech corpus generated from Catalan Parliamentary sessions

Category: Speech

License: CC Attribution 4.0 (CC BY 4.0)

Downloads (use a mirror closer to you):
parlament_v1.0_clean.tar.gz [7.7G]   ( 90 hours of "clean" speech and transcripts )   Mirrors: [US]   [EU]   [CN]  
parlament_v1.0_other.tar.gz [19G]   ( 230 hours of "other" speech and transcripts )   Mirrors: [US]   [EU]   [CN]  

About this resource:

ParlamentParla is a speech corpus for Catalan, published by the workers cooperative Col·lectivaT. The audio segments were extracted from recordings the Catalan Parliament Catalan Parliament (Parlament de Catalunya) plenary sessions. The recordings were aligned with their transcripts, and 320 hours of cleanest segments are extracted. The content belongs to the Catalan Parliament and the data is released conforming their terms of use.

Preparation of this corpus was supported by the Department of Culture of the Catalan autonomous government.

The audio files are PCM 16bit mono, little endian with the sample rate 16 kHz. As of release version 1.0, the corpus is separated into 90 hours of clean and 230 hours of other quality segments.

For contact   The official ParlamentParla corpus webpage, with other resources and updates

External URLs:   (clean data )   (other data )