Audio Sources¶
Using the correct input for the speech recognition models is very important. If you decide to use your own method
for loading data (or librosa
’s method), then recognition performance might/will be degraded.
A list of the most important audio utilities is provided below.
-
danspeech.audio.
load_audio
(path, duration=None, offset=None)¶ Loads a sound file.
Supported formats are WAV, AIFF, FLAC.
- Parameters
path (str) – Path to sound file
duration (float) – Duration in seconds of how much to use. If duration is not specified, then it will record until there is no more audio input.
offset (float) – Where to start in seconds in the clip.
- Returns
Input array ready for speech recognition.
- Return type
numpy.array
-
danspeech.audio.
load_audio_wavPCM
(path)¶ Fast load of wav.
This works well if you are certain that your wav files are PCM encoded.
- Parameters
path (str) – Path to wave file.
- Returns
Input array ready for speech recognition.
- Return type
numpy.array
-
class
danspeech.audio.
Microphone
(device_index=None, sampling_rate=16000, chunk_size=1024)¶ Source: https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/__init__.py
Modified for DanSpeech
Warning: Requires PyAudio.
Creates a Microphone instance, which represents the a microphone on the computer.
The microphone needs a device index, or else it will try to use the default microphone of the system.
Sampling rate should always be 16000, if the microphone should work with DanSpeech models.
- Parameters
device_index (int) – The device index of your microphone. Use
Microphone.list_microphone_names()
to find the available input sources and choose the appropriate one.sampling_rate (int) – Should always be 16000 unless you configured audio configuration of a your own trained danspeech model.
chunk_size (int) – Avoid changing chunk size unless it is strictly neccessary. WARNING: Will possibly break microphone streaming with DanSpeech models.
- Example
from danspeech import Recognizer {} from danspeech.pretrained_models import TestModel from danspeech.audio.resources import Microphone # Get a list of microphones found by PyAudio mic_list = Microphone.list_microphone_names() mic_list_with_numbers = list(zip(range(len(mic_list)), mic_list)) print("Available microphones: {0}".format(mic_list_with_numbers)) # Choose the microphone mic_number = input("Pick the number of the microphone you would like to use: ") # Init a microphone object m = Microphone(sampling_rate=16000, device_index=int(mic_number)) # Init a DanSpeech model and create a Recognizer instance model = TestModel() recognizer = Recognizer(model=model) print("Speek a lot to adjust silence detection from microphone...") with m as source: recognizer.adjust_for_speech(source, duration=5) # Enable streaming recognizer.enable_streaming() # Create the streaming generator which runs a background thread listening to the microphone stream generator = recognizer.streaming(source=m) # The below code runs for a long time. The generator returns transcriptions of spoken speech from your microphone. print("Speak") for i in range(100000): trans = next(generator) print(trans)
-
static
list_microphone_names
()¶ Source: https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/__init__.py
Find all available input sources.
The index of each microphone’s name in the returned list is the same as its device index when creating a Microphone instance - if you want to use the microphone at index 3 in the returned list, use
Microphone(device_index=3)
.Warning: Will also show sources that are not actually microphones, which will result in an error. Try another one, that sounds plausible.
- Returns
A list of the names of all available microphones.
- Return type
list