Language Models

All of the available DanSpeech language models are shown below. If you need to finetune or train your own model, then you can find more information at DanSpeech training repository.

Recommended usage for all language models (except a custom model):

from danspeech.language_models import DSL3gram
lm = DSL3gram()

Method

All language models are n-gram models with modified Kneser-Ney smoothing constructed from large text-corpora.

They have been generated with the use of kenLM.

Available models

danspeech.language_models.DSL3gram(cache_dir=None)

DSL 3-gram language model. This is the best performing for out test cases along with DSL 5-gram.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.DSL5gram(cache_dir=None)

DSL 5-gram language model. This is the best performing for out test cases along with DSL 3-gram.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.DSLWiki3gram(cache_dir=None)

DSL and wikipedia corpus trained 3-gram model.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.DSLWiki5gram(cache_dir=None)

DSL and wikipedia corpus trained 5-gram model.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.DSLWikiLeipzig3gram(cache_dir=None)

DSL, wikipedia and Leipzig corpus trained 3-gram model.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.Wiki3gram(cache_dir=None)

wikipedia corpus trained 3-gram model.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.Wiki5gram(cache_dir=None)

wikipedia corpus trained 5-gram model.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.Folketinget3gram(cache_dir=None)

3-gram language model trained on all meeting summaries from the Danish Parliament (Folketinget)

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.DSL3gramWithNames(cache_dir=None)

Includes DSL + a bias towards the most common names in Denmark.

DSL 3-gram language model. This is the best performing for out test cases along with DSL 5-gram.

Parameters

cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the ~/.danspeech/lms/ folder.

Returns

path to .klm language model

Return type

str

danspeech.language_models.CustomLanguageModel(path)

Custom language model. This is actually a dummy wrapper, and you may also pass the path to your custom .klm model directly to the recognizer.

Parameters

path (str) – Path to a .klm language model

Returns

path to .klm language model

Return type

str