Open Speech and Language Resources

Samrómur Children 21.09

Identifier: SLR117

Summary: Samrómur Icelandic Speech from children (ages 4-17 years) approved for release in September 2021

Category: Speech

License: CC by 4.0

Downloads (use a mirror closer to you): [6.8G]   (icelandic speech and metadata files )   Mirrors: [US]   [EU]   [CN]  

About this resource:

This release of data from the Samrómur collection focuses on data collected from children. It contains more than 137,000 validated speech-recordings uttered by Icelandic children.

The corpus is a result of the crowd-sourcing effort run by the Language and Voice Lab (LVL) at Reykjavik University, in cooperation with Almannarómur, the Icelandic Center for Language Technology. The recording process has started in October 2019 and continues to this day (December 2021). The present edition of the corpus has been authorized for release in September 2021. The aim is to create an open-source speech corpus to enable research and development for Icelandic Language Technology. The corpus consists of audio recordings and a metadata file containing the prompts read by the participants.

Participants are aged between 4 to 17 years. The distributed audio files are encoded at 16 kHz sampling rate, 16 bit linear PCM, 1 channel, *.flac format. The corpus is split into train, dev, and test subsets with no speaker overlap. Each subset contains folders that correspond to speaker IDs, and the audio files inside use the following naming convention: {speaker_ID}-{utterance_ID}.flac.

You can cite the data using the following BibTeX entry:
        title={{Samr{\'o}mur Children Icelandic Speech 21.09}},
        author={Carlos Mena, Michal Borsky, David Erik Mollberg, Sm{\'a}ri 
        Freyr Gu{\dh}mundsson, Staffan Hedstr{\"o}m, Ragnar P{\'a}lsson, 
        {\'O}lafur Helgi, J{\'o}nsson, Sunneva {\TH}orsteinsd{\'o}ttir, 
        J{\'o}hanna Vigd{\'\i}s Gu{\dh}mundsd{\'o}ttir, Eyd{\'\i}s Huld 
        Magn{\'u}sd{\'o}ttir, Ragnhei{\dh}ur {\TH}{\'o}rhallsd{\'o}ttir, 
        Jon Gudnason},
        publisher={Reykjavik University: Language and Voice Lab}