Speech to text (2019 version)

The recommended approach to do this seems to change somewhat each time I review it. Part of the problem is that many of solutions use are siphoning from Google’s products without licenses, so Google keeps tweaking their systems to lock them out.



sudo apt-get install python3 python3-all-dev python3-pip build-essential git
sudo apt-get install swig libpulse-dev libasound2-dev 
pip3 install pocketsphinx python3-pyaudio SpeechRecognition


brew install swig git python3
brew install portaudio
pip3 install pocketsphinx SpeechRecognition


To use

# Import the talktome file
import talktome

# Get a list of available microphones

# Create object
ttm = talktome.Talktome(0)

#  Speech to text demo
while True:
    input("Press enter to start talking")
    msg = ttm.listen()
    print("I think you said: "+msg)


Speech to text (2018 version)

Libraries to install

To use

import audiowrapper
import speech_recognition as sr

def speech_to_text(filename):
    r = sr.Recognizer()
    with sr.AudioFile(filename) as source:
        audio = r.record(source)
        return r.recognize_google(audio)
    except sr.UnknownValueError:
        print("Error: Count not understand audio")
        return ""
    except sr.RequestError as e:
        print("Error: Request error; {}".format(e))
        return ""

# Record audio
audio = audiowrapper.Audio()
input("Press ENTER to START recording")
input("Press ENTER to STOP recording")

# Convert audio recording to text
text = speech_to_text("test.wav")
print("I think you said: "+text)

Important note

This is currently using an unregistered version of Googles speech recognition system. This only works for a maximum 50 requests per day and they can cancel it for “over use” (even less than the 50). It’s only suitable for testing. Google do offer a free one year trial of their full product, but we’d have to sign you up for that. I’ll help you through the process of setting that up (it can get a little tricky). There is an “oiffline” library we could also use.


Text to speech


pip install pyttsx3
pip install pyobjc # for mac


import pyttsx3
engine = pyttsx3.init()
voices = engine.getProperty('voices')
for v in voices:
    txt = "My name is "+v.name
    engine.setProperty('voice', v.id)