Language Models¶
All of the available DanSpeech language models are shown below. If you need to finetune or train your own model, then you can find more information at DanSpeech training repository.
Recommended usage for all language models (except a custom model):
from danspeech.language_models import DSL3gram
lm = DSL3gram()
Method¶
All language models are n-gram models with modified Kneser-Ney smoothing constructed from large text-corpora.
They have been generated with the use of kenLM.
Available models¶
-
danspeech.language_models.
DSL3gram
(cache_dir=None)¶ DSL 3-gram language model. This is the best performing for out test cases along with DSL 5-gram.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
DSL5gram
(cache_dir=None)¶ DSL 5-gram language model. This is the best performing for out test cases along with DSL 3-gram.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
DSLWiki3gram
(cache_dir=None)¶ DSL and wikipedia corpus trained 3-gram model.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
DSLWiki5gram
(cache_dir=None)¶ DSL and wikipedia corpus trained 5-gram model.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
DSLWikiLeipzig3gram
(cache_dir=None)¶ DSL, wikipedia and Leipzig corpus trained 3-gram model.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
Wiki3gram
(cache_dir=None)¶ wikipedia corpus trained 3-gram model.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
Wiki5gram
(cache_dir=None)¶ wikipedia corpus trained 5-gram model.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
Folketinget3gram
(cache_dir=None)¶ 3-gram language model trained on all meeting summaries from the Danish Parliament (Folketinget)
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
DSL3gramWithNames
(cache_dir=None)¶ Includes DSL + a bias towards the most common names in Denmark.
DSL 3-gram language model. This is the best performing for out test cases along with DSL 5-gram.
- Parameters
cache_dir (str) – If you wish to use custom directory to stash/cache your models. This is generally not recommended, and if left out, the DanSpeech models will be stored in the
~/.danspeech/lms/
folder.- Returns
path to .klm language model
- Return type
str
-
danspeech.language_models.
CustomLanguageModel
(path)¶ Custom language model. This is actually a dummy wrapper, and you may also pass the path to your custom .klm model directly to the recognizer.
- Parameters
path (str) – Path to a .klm language model
- Returns
path to .klm language model
- Return type
str