LibriSpeech language models, vocabulary and G2P models
Identifier: SLR11
Summary: Language modelling resources, for use with the LibriSpeech ASR corpus
Category: Text
License: Public domain
Downloads (use a mirror closer to you):
librispeech-lm-corpus.tgz [1.8G] ( 14500 public domain books, used as training material for the LibriSpeech's LM
) Mirrors:
[US]
[EU]
[CN]
librispeech-lm-norm.txt.gz [1.5G] (Normalized LM training text
) Mirrors:
[US]
[EU]
[CN]
librispeech-vocab.txt [1.7M] (200K word vocabulary for the LM
) Mirrors:
[US]
[EU]
[CN]
librispeech-lexicon.txt [5.6M] (Pronunciations, some of which G2P auto-generated, for all words in the vocabulary
) Mirrors:
[US]
[EU]
[CN]
3-gram.arpa.gz [759M] (3-gram ARPA LM, not pruned
) Mirrors:
[US]
[EU]
[CN]
3-gram.pruned.1e-7.arpa.gz [34M] (3-gram ARPA LM, pruned with theshold 1e-7
) Mirrors:
[US]
[EU]
[CN]
3-gram.pruned.3e-7.arpa.gz [13M] (3-gram ARPA LM, pruned with theshold 3e-7
) Mirrors:
[US]
[EU]
[CN]
4-gram.arpa.gz [1.3G] (4-gram ARPA LM, usually used for rescoring
) Mirrors:
[US]
[EU]
[CN]
g2p-model-5 [20M] (Fifth order Sequitur G2P model
) Mirrors:
[US]
[EU]
[CN]
About this resource:
This corpus and these resources were prepared by Vassil Panayotov with the assistance of Daniel Povey and Sanjeev Khudanpur. We hope to finalize this and release the corpus here by the ICASSP deadline (early October 2014).