English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Voice recognition is one of the most useful features in various applications such as home automation and artificial intelligence. In this section, we will learn how to use Python and Google's Speech API to complete voice recognition.
In this case, we will use the microphone to provide audio for voice recognition. To configure the microphone, there are some parameters.
To use this module, we must install the SpeechRecognition module. There is also another module called pyaudio, which is optional. With this feature, we can set different audio modes.
sudo pip3 install SpeechRecognition sudo apt-get install python3-pyaudio
For external microphones or USB microphones, we need to provide an accurate microphone to avoid any difficulties. On Linux, if you type 'lsusb' to display the relevant information of USB devices.
The second parameter is 'chunk size'. With this option, we can specify how much data to read at a time. This will be2of power, for example1024Or2048Etc.
We also need to specify the sampling rate to determine the frequency of processing the recorded data.
Since there may be some unavoidable noise around, we must adjust the ambient noise to obtain accurate sound.
Get other information related to the microphone.
Configure the microphone with block size, sample rate, and ambient noise adjustment.
Wait for a while to get the sound
After identifying the voice, try to convert it to text, otherwise some errors may occur.
Stop this process.
import speech_recognition as spreg #Setup the sampling rate and the data size sample_rate = 48000 data_size = 8192 recog = spreg.Recognizer() with spreg.Microphone(sample_rate = sample_rate, chunk_size = data_size) as source: recog.adjust_for_ambient_noise(source) print('Tell Something: ') speech = recog.listen(source) try: text = recog.recognize_google(speech) print('You have said: ') + text) except spreg.UnknownValueError: print('Unable to recognize the audio') except spreg.RequestError as e: print("Request error from Google Speech Recognition service; {}".format(e))
Output Result
$ python3 318.speech_recognition.py Tell Something: You have said: here we are considering the asymptotic notation Pico to calculate the upper bound of the time complexity so then the definition of the big O notation is like this one $
Without using a microphone, we can also convert some audio files into speech as input.
import speech_recognition as spreg sound_file = 'sample_audio.wav' recog = spreg.Recognizer() with spreg.AudioFile(sound_file) as source: speech = recog.record(source) # use record instead of listening try: text = recog.recognize_google(speech) print('The file contains: ') + text) except spreg.UnknownValueError: print('Unable to recognize the audio') except spreg.RequestError as e: print("Request error from Google Speech Recognition service; {}".format(e))
Output Result
$ python3 318a.speech_recognition_file.py The file contains: staying ahead of the curve demand planning new technology it also helps you progress in your career $