Audio Sources

Using the correct input for the speech recognition models is very important. If you decide to use your own method for loading data (or librosa’s method), then recognition performance might/will be degraded.

A list of the most important audio utilities is provided below.

danspeech.audio.load_audio(path, duration=None, offset=None)

Loads a sound file.

Supported formats are WAV, AIFF, FLAC.

Parameters
  • path (str) – Path to sound file

  • duration (float) – Duration in seconds of how much to use. If duration is not specified, then it will record until there is no more audio input.

  • offset (float) – Where to start in seconds in the clip.

Returns

Input array ready for speech recognition.

Return type

numpy.array

danspeech.audio.load_audio_wavPCM(path)

Fast load of wav.

This works well if you are certain that your wav files are PCM encoded.

Parameters

path (str) – Path to wave file.

Returns

Input array ready for speech recognition.

Return type

numpy.array

class danspeech.audio.Microphone(device_index=None, sampling_rate=16000, chunk_size=1024)

Source: https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/__init__.py

Modified for DanSpeech

Warning: Requires PyAudio.

Creates a Microphone instance, which represents the a microphone on the computer.

The microphone needs a device index, or else it will try to use the default microphone of the system.

Sampling rate should always be 16000, if the microphone should work with DanSpeech models.

Parameters
  • device_index (int) – The device index of your microphone. Use Microphone.list_microphone_names() to find the available input sources and choose the appropriate one.

  • sampling_rate (int) – Should always be 16000 unless you configured audio configuration of a your own trained danspeech model.

  • chunk_size (int) – Avoid changing chunk size unless it is strictly neccessary. WARNING: Will possibly break microphone streaming with DanSpeech models.

Example
from danspeech import Recognizer {}
from danspeech.pretrained_models import TestModel
from danspeech.audio.resources import Microphone

# Get a list of microphones found by PyAudio
mic_list = Microphone.list_microphone_names()
mic_list_with_numbers = list(zip(range(len(mic_list)), mic_list))
print("Available microphones: {0}".format(mic_list_with_numbers))

# Choose the microphone
mic_number = input("Pick the number of the microphone you would like to use: ")

# Init a microphone object
m = Microphone(sampling_rate=16000, device_index=int(mic_number))

# Init a DanSpeech model and create a Recognizer instance
model = TestModel()
recognizer = Recognizer(model=model)

print("Speek a lot to adjust silence detection from microphone...")
with m as source:
    recognizer.adjust_for_speech(source, duration=5)

# Enable streaming
recognizer.enable_streaming()

# Create the streaming generator which runs a background thread listening to the microphone stream
generator = recognizer.streaming(source=m)

# The below code runs for a long time. The generator returns transcriptions of spoken speech from your microphone.
print("Speak")
for i in range(100000):
    trans = next(generator)
    print(trans)
static list_microphone_names()

Source: https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/__init__.py

Find all available input sources.

The index of each microphone’s name in the returned list is the same as its device index when creating a Microphone instance - if you want to use the microphone at index 3 in the returned list, use Microphone(device_index=3).

Warning: Will also show sources that are not actually microphones, which will result in an error. Try another one, that sounds plausible.

Returns

A list of the names of all available microphones.

Return type

list