Note
Click here to download the full example code
Transcribing a single audio fileΒΆ
In this example script, DanSpeech is used to transcribe the same audio file with three different outputs:
Greedy decoding: using no external language model.
Beam search decoding 1: Decoding with a language model (
language_models.DSL3gram()
).Beam search decoding 2: Decoding with a language model (
language_models.DSL3gram()
) and returning all the beam_width most probable beams.
Out:
Using device: cpu
DanSpeech model updated to: TestModel
No language model:
tester en to tre fire sem seks syv otte
DanSpeech decoder updated
Single transcription:
tester en to tre fire fem seks syv otte
Most likely beams:
tester en to tre fire fem seks syv otte
tester en to tre fire fem seks syv ofte
tester en to tre fire fem seks syv otter
tester en to tre fire fem seks syv tte
tester en to tre fire fem seks syv ottey
tester en to tre fire fem seks syv ote
tester en to tre fire fem seks syv ottet
tester en to tre fire fem seks syv ottek
tester en to tre fire fem seks syv ottes
tester en to tre fire fem seks syv otteo
from danspeech import Recognizer
from danspeech.pretrained_models import TestModel
from danspeech.language_models import DSL3gram
from danspeech.audio import load_audio
# Load a DanSpeech model. If the model does not exists, it will be downloaded.
model = TestModel()
recognizer = Recognizer(model=model)
# Load the audio file.
audio = load_audio(path="../example_files/u0013002.wav")
print()
print("No language model:")
print(recognizer.recognize(audio))
# DanSpeech with a language model.
# Note: Requires ctcdecode to work!
try:
lm = DSL3gram()
recognizer.update_decoder(lm=lm, alpha=1.2, beta=0.15, beam_width=10)
except ImportError:
print("ctcdecode not installed. Using greedy decoding.")
print()
print("Single transcription:")
print(recognizer.recognize(audio, show_all=False))
print()
beams = recognizer.recognize(audio, show_all=True)
print("Most likely beams:")
for beam in beams:
print(beam)
Total running time of the script: ( 0 minutes 12.802 seconds)