📜 ⬆️ ⬇️

Jarvis is back in business

Surely, everyone dreams of his voice assistant, under the cut is another implementation of "Jarvis" from the famous movie.

image

It has long been the thought of his "Jarvis" and the management of equipment in the house voice. And then, finally, the hands reached the creation of this miracle. I didn’t have to think long over the brains;

So iron:
')

Implementation


Our assistant will work on the principle of Alexa / Hub:

  1. Activate offline for a specific word
  2. Recognize the team in the cloud
  3. Execute the command
  4. Report on the work or inform the requested information.

Since My camera is supported out of the box, I didn’t have to mess with the drivers, so we immediately proceed to the software part.

Offline activation


Activation will occur with the help of CMU Sphinx, and everything would be fine, but out of the box, recognition is very slow, more than 10 seconds, which is absolutely not suitable, you need to clear the dictionary of unnecessary words to solve the problem.

Install everything you need:

pip3 install SpeechRecognition pip3 install pocketsphinx 

Further

 sudo nano /usr/local/lib/python3.4/dist-packages/speech_recognition/pocketsphinx-data/en-US /pronounciation-dictionary.dict 

delete everything except the Jarvis we need:

  jarvis JH AA RV AH S 

Now pocketsphinx recognizes pretty quickly.

Speech recognition


At first there was the idea to use the Google service, besides its support is in SpeechRecognition. But as it turned out, Google takes money for it and does not work with physical. by individuals.

The benefit of Yandex also provides such an opportunity, free of charge and extremely simple.

Register, get API KEY. All work can be done curl'om.

 curl -X POST -H "Content-Type: audio/x-wav" --data-binary "@file" «https://asr.yandex.net/asr_xml?uuid=ya_uid&key=yf_api_key&topic=queries» 

Speech synthesis


Here again Yandex will help us. We send the text in reply we receive the file with the synthesized text

 curl «https://tts.voicetech.yandex.net/generate?format=wav&lang=ru-RU&speaker=zahar&emotion=good&key=ya_api_key» -G --data-urlencode "text=text" > file 

Jarvis


Putting it all together and get this script.

 #! /usr/bin/env python # -*-coding:utf-8-*- import os import speech_recognition as sr from xml.dom import minidom import sys import random r = sr.Recognizer() ya_uuid = '' ya_api_key = '' # os.system('echo "+ +" |festival --tts --language russian') def convert_ya_asr_to_key(): xmldoc = minidom.parse('./asr_answer.xml') itemlist = xmldoc.getElementsByTagName('variant') if len(itemlist) > 0: return itemlist[0].firstChild.nodeValue else: return False def jarvis_on(): with sr.WavFile("send.wav") as source: audio = r.record(source) try: t = r.recognize_sphinx(audio) print(t) except LookupError: print("Could not understand audio") return t == ("jarvis") def jarvis_say(phrase): os.system( 'curl "https://tts.voicetech.yandex.net/generate?format=wav&lang=ru-RU&speaker=zahar&emotion=good&key='+ya_api_key+'" -G --data-urlencode "text=' + phrase + '" > jarvis_speech.wav') os.system('aplay jarvis_speech.wav') def jarvis_say_good(): phrases = ["", "", "", "", "- ?", ] randitem = random.choice(phrases) jarvis_say(randitem) try: while True: os.system('arecord -B --buffer-time=1000000 -f dat -r 16000 -d 3 -D plughw:1,0 send.wav') if jarvis_on(): os.system('aplay jarvis_on.wav') os.system('arecord -B --buffer-time=1000000 -f dat -r 16000 -d 3 -D plughw:1,0 send.wav') os.system( 'curl -X POST -H "Content-Type: audio/x-wav" --data-binary "@send.wav" "https://asr.yandex.net/asr_xml?uuid='+ya_uuid+'&key='+ya_api_key+'&topic=queries" > asr_answer.xml') command_key = convert_ya_asr_to_key() if (command_key): if (command_key in ['key_word', 'key_word1', 'key_word2']): os.system('') jarvis_say_good() continue except Exception: jarvis_say('-   ') 

What's going on here. Run an infinite loop, arecord'om write three seconds and send sphinx for recognition if the word “jarvis” appears in the file

  if jarvis_on(): 

We play a pre-recorded activation alert file.

Again, write 3 seconds and send to Yandex, in response we get our team. Next, perform actions based on the command.

That's all for it. A lot of scripts can be created.

Use-case


Now some real examples of my use.

Philips Hue


Install

 pip install phue 

In the Hue application, set the static IP:

image

Run:

 #!/usr/bin/python import sys from phue import Bridge b = Bridge('192.168.0.100') # Enter bridge IP here. #If running for the first time, press button on bridge and run with b.connect() uncommented #b.connect() print (b.get_scene()) 

We write out the ID of the necessary schemes, type "470d4c3c8-on-0"

The final script:

 #!/usr/bin/python import sys from phue import Bridge b = Bridge('192.168.0.100') # Enter bridge IP here. #If running for the first time, press button on bridge and run with b.connect() uncommented #b.connect() if (sys.argv[1] == 'off'): b.set_light([1,2,3],'on', False) else: b.activate_scene(1,sys.argv[1]) 

Add to jarvis:

  if (command_key in [' ', ' ', '']): os.system('python3 /home/pi/smarthome/hue/hue.py a1167aa91-on-0') jarvis_say_good() continue if (command_key in [' ', ' ']): os.system('python3 /home/pi/smarthome/hue/hue.py ac637e2f0-on-0') jarvis_say_good() continue if (command_key in [' ', ' ']): os.system('python3 /home/pi/smarthome/hue/hue.py "off"') jarvis_say_good() continue 

Lg tv


We take the script from here . After the first launch and input of the pairing code, the code itself does not change, so you can cut this part out of the script and leave only the manager.

Add to jarvis:

 #1 - POWER #24 - VOLUNE_UP #25 - VOLUME_DOWN #400 - 3D_VIDEO if (command_key in [' ', ' ']): os.system('python3 /home/pi/smarthome/TV/tv2.py 1') jarvis_say_good() continue if (command_key in [' ', '']): os.system('python3 /home/pi/smarthome/TV/tv2.py 24') jarvis_say_good() continue 

Radio


 sudo apt-get install mpg123 

Add to jarvis:

 if (command_key in ['', ' ',' ']): os.system('mpg123 URL') continue 

You can also put a homebridge and manage everything through Siri, if you don’t shout before Jarvis.

As for the quality of speech recognition, not Alexa of course, but at a distance of 5 meters the percentage of correct hit is decent. The main problem - the speech from the TV \ speakers is recorded along with the commands and interferes with the recognition.

That's all, thank you.

Source: https://habr.com/ru/post/401049/


All Articles