Voice control computer and Python

Having read various posts about Google Voice and its use, I decided to write something of my own. Namely - voice control computer. Immediately make a reservation that the OS - Windows.

We will need:

- Python 2.7
- libraries:
pyaudio
pycurl
pywin32
+ set of standard libraries
- any audio converter that supports flac and wav, as well as work from the command line, I used this one .

How it works

We record the audio file, send Google Voice, and get the answer in the form:
{"Status": 0, "id": "5e34348f2887c7a3cc27dc3695ab4575-1", "hypotheses": [{utterance: "example", "confidence": 0.7581704}]},
and further we process it, also we will teach the computer to “talk”, using the same Google Voice.

Getting started

First we need to connect all the modules, so that later this does not come back:
')

import time, pyaudio, wave, os, urllib,urllib2,pycurl,httplib,sys,win32api,win32con,string from ctypes import *

Okay, now let's write the Talk function, which allows the computer to talk to us.

 def Talk(text): def downloadFile(url, fileName): fp = open(fileName, "wb") curl = pycurl.Curl() curl.setopt(pycurl.URL, url) curl.setopt(pycurl.WRITEDATA, fp) curl.perform() curl.close() fp.close() def getGoogleSpeechURL(phrase): googleTranslateURL = "http://translate.google.com/translate_tts?tl=en&" parameters = {'q': phrase} data = urllib.urlencode(parameters) googleTranslateURL = "%s%s" % (googleTranslateURL,data) return googleTranslateURL def speakSpeechFromText(phrase): googleSpeechURL = getGoogleSpeechURL(phrase) downloadFile(googleSpeechURL,"ans.mp3")#,       ans.mp3 speakSpeechFromText(text) # ,    winmm = windll.winmm winmm.mciSendStringA('Open "ans.mp3" Type MPEGVideo Alias theMP3',0,0,0) winmm.mciSendStringA('Play theMP3 Wait',0,0,0) winmm.mciSendStringA("Close theMP3","",0,0)

In general, we get the Talk (text) function, which accordingly "tells" us the text.
Codecs were the only solution that allowed me to play mp3 from Python.

Oh, yes, Google also communicates with us exclusively in English (and perceives only English), because I could not make friends with Python and utf-8.

Record

The next step is recording the speech that Google will process. Boldly copy-paste:

 def Record(): CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 16000 RECORD_SECONDS = 5 WAVE_OUTPUT_FILENAME = "output.wav" p = pyaudio.PyAudio() stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("Recording...") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("Done recording.") stream.stop_stream() stream.close() p.terminate() wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) wf.close()

This code is on the pyaudio website and I left it almost unchanged.

But, here is the first problem, we got the wav file, and Google understands flac ... Disorder.
Here we are rescued by the converter and the os module:

 def Convert(): print "Converting" os.system('C:\Users\\Desktop\\TotalAudioConverter\AudioConverter.exe C:\Users\\Desktop\\output.wav C:\Users\\Desktop\\output.flac') print "Done"

Here I will explain, we specify the path to the installed converter, and then we first pass the input file as parameters, then the output one (specify the paths completely).

We send our record to the server

 def Send(): global ANSWER url = 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-EN'#    flac=open('output.flac',"rb").read() header = {'Content-Type' : 'audio/x-flac; rate=16000'} req = urllib2.Request(url, flac, header) data = urllib2.urlopen(req) a = data.read() ANSWER = eval(a) if ANSWER['status'] == 5: print 'Sorry, I do not understand you.' Talk('Sorry, I do not understand you.') ANSWER = 0 else: ANSWER = ANSWER['hypotheses'][0]['utterance']#   google  ( ) print ANSWER os.remove('C:\Users\\Desktop\\output.wav')#   os.remove('C:\Users\\Desktop\\output.flac') return ANSWER

Treatment

Having received the answer, we can use it. It all depends solely on your imagination.

Just give a few examples:

 def Processing(): global ANSWER if ANSWER == 0: return 0 elif 'chrome' in ANSWER.lower(): os.system('C:\Users\\AppData\Local\Google\Chrome\Application\chrome.exe')# Google Chrome,     chrome) elif 'skype' in ANSWER.lower(): os.system('C:\Users\\Downloads\SkypePortable\SkypePortable.exe')# elif 'cd rom' in ANSWER.lower() or\ 'cd-rom' in ANSWER.lower() or\ 'open d' in ANSWER.lower() or\ 'dvd' in ANSWER.lower() or\ 'dvd-rom' in ANSWER.lower() or\ 'dvd rom' in ANSWER.lower() or\ 'cdrom' in ANSWER.lower() or\ 'cd - rom' in ANSWER.lower(): winmm = windll.winmm winmm.mciSendStringA("set cdaudio door open", "", 0,0)#  -    dvd

I also added a function that stopped the execution of the program for a specified time, but there is a large code and I see no reason to post it here, because the benefits of it tend to zero.

Run

After we describe these functions, add the following code:

 print 'Hi, what do you want?' Talk('Hi, what do you want?') Record() Convert() print ('Sending...') Send() print 'Done' Processing() while True: ANSWER = None #Talk('Done.') print 'Do you want something else? (Your command\No)' Talk('Do you want something else??') Record() Convert() print 'Sending...' Send() print 'Done' #print ANSWER if ANSWER == 0: continue if ANSWER.lower()== 'no' or\#  ,   ANSWER.lower()== 'nope' or\ ANSWER.lower()== 'not' or\ ANSWER.lower()== 'nay': break else: Processing() print 'Okay, bye' Talk('Okay, bye')

Done!

On this I would like to finish. I'm glad if someone helped.

Thanks for attention.

Source: https://habr.com/ru/post/263423/

All Articles