📜 ⬆️ ⬇️

Step-by-Step Guide to Creating a Voice Assistant with Python

image

Hello!

Who does not want the luxury of having an assistant who always listens to your call, foresees all your needs and takes action if necessary? This luxury is now available through artificial intelligence based voice assistants.
')
Voice assistants are delivered in small packages and can perform various actions when they hear your team. They can include lights, answer questions, play music, place online orders and do all kinds of artificial intelligence.

Voice assistants should not be confused with virtual assistants, who are people who work remotely and therefore can perform all kinds of tasks. Voice assistants are based on technology. As voice assistants become more reliable, their usefulness in both personal and business areas will grow.

image

What is a voice assistant?


A voice assistant or an intelligent personal assistant is a software agent who can perform tasks or services for a person based on verbal commands, that is, by interpreting human speech and responding with synthesized voices. Users can ask questions to their assistant, using verbal commands, manage home automation devices and playing media using voice, as well as manage other basic tasks, such as email, to-do lists, open or close any application, etc.

Let me give you an example of Braina (Brain Artificial), which is an intelligent personal assistant, interface in the human language, software for automation and voice recognition for Windows PCs. Braina is a multifunctional software for artificial intelligence that allows you to interact with your computer using voice commands in most languages ​​of the world. Braina, in addition, accurately converts speech into text in more than 100 different languages ​​of the world.

History of Voice Assistants


image

Recently, voice assistants got the main platform after Apple integrated the most amazing Virtual Assistant, Siri, which is officially part of Apple Inc. But the schedule of greatest development began with the events of 1962 at an exhibition in Seattle, where IBM introduced a unique device called the Shoebox. It was a shoebox-sized device, it could perform scientific functions and could perceive 16 words, as well as pronounce them in a human recognizable voice, and numbers from 0 to 9.

During the 1970s, researchers from Carnegie Mellon University in Pittsburgh, Pennsylvania, with substantial assistance from the US Department of Defense and its Defense Advanced Research Agency (DARPA), created Harpy. She could understand almost 1000 words, which roughly corresponds to the vocabulary of a three-year-old child.

Large organizations such as Apple and IBM, early in the 90s, began to create things that use voice confirmation. In 1993, Macintosh began creating speech recognition systems on its Macintosh computers with PlainTalk.

In April 1997, Dragon NaturallySpeaking was the first constant dictation product that could cover about 100 words and convert it into readable content.

image

After all this, how great it would be to create a simple voice assistant on your desktop / laptop that could:


So, here, in this article, we are going to create a voice application capable of performing all the above tasks. But first, watch this video that I made when I was chatting with my desktop voice assistant. And I call her Sophia.

I hope you guys enjoyed the above video in which I spoke with Sofia. Now let's start building this cool thing.
Before starting the study, I recommend reading about voice assistants in more detail, as well as following the news in my telegram channel Neuron (@neurondata), so as not to miss interesting articles on data and data science.
Required system requirements: Python 2.7, Spyder IDE, MacOS Mojave (version 10.14)
Install all of these Python libraries:

pip install SpeechRecognition
pip install beautifulsoup4
pip install vlc
pip install youtube-dl
pip install pyowm
pip install wikipedia

Let's start creating our desktop voice assistant with Python



Start by importing all the required libraries:

import speech_recognition as sr import os import sys import re import webbrowser import smtplib import requests import subprocess from pyowm import OWM import youtube_dl import vlc import urllib import urllib2 import json from bs4 import BeautifulSoup as soup from urllib2 import urlopen import wikipedia import random from time import strftime 

In order for our voice assistant to perform all the functions discussed above, we must encode the logic of each of them in one way.

So, our first step is to create a method that will interpret the user's voice response:

 def myCommand(): r = sr.Recognizer() with sr.Microphone() as source: print('Say something...') r.pause_threshold = 1 r.adjust_for_ambient_noise(source, duration=1) audio = r.listen(source) try: command = r.recognize_google(audio).lower() print('You said: ' + command + '\n') #loop back to continue to listen for commands if unrecognizable speech is received except sr.UnknownValueError: print('....') command = myCommand(); return command 


Then create a method that will convert text to speech:

 def sofiaResponse(audio): print(audio) for line in audio.splitlines(): os.system("say " + audio) 

Now create a loop to continue the execution of several commands. Inside the assistant () method, the user command (myCommand ()) is passed as a parameter:

 while True: assistant(myCommand()) 

Our next step is to create several if statements corresponding to each function. So let's see how to create these small modules within an if statement for each command.

1. Open Reddit subreddit in the browser


The user will give any command to open any subreddit from Reddit, and the command must be “Hey, Sophia! Can you open Reddit subreddit_name ?. What is bold italic should be used as is. You can use any kind of prefix, just take care of what is highlighted in bold.

How it works
If you succeeded in producing open reddit in your command, then it will look for the name subreddit in the user command using re.search (). The loan will be found using www.reddit.com and will be opened in the browser using the pythons Webbrowser module. The Webbrowser module provides a high-level interface that allows you to display web documents for users.

 if 'open reddit' in command: reg_ex = re.search('open reddit (.*)', command) url = 'https://www.reddit.com/' if reg_ex: subreddit = reg_ex.group(1) url = url + 'r/' + subreddit webbrowser.open(url) sofiaResponse('The Reddit content has been opened for you Sir.') 

Thus, the above code will open the Reddit you need in your default browser.

2. Open any website in the browser


You can open any website by simply saying “open website.com” or “open website.org”.

For example: “Please open facebook” or “Hey, you can open linkedin”, so you can ask Sofia to open any website.

How it works
If you say the word open in your command, it will search for the name of the website in the user command with re.search (). Then it will add the website name to https: // www. and using the web browser module, the full URL is then opened in the browser.

 elif 'open' in command: reg_ex = re.search('open (.+)', command) if reg_ex: domain = reg_ex.group(1) print(domain) url = 'https://www.' + domain webbrowser.open(url) sofiaResponse('The website you have requested has been opened for you Sir.') else: pass 

3. Send e-mail


You can also ask your desktop assistant to send an email.

How it works
If you said the word email in your team, then the bot will ask about the recipient. If my answer is rajat, the bot will use the pthons smtplib library. The smtplib module defines an SMTP client session object that can be used to send mail to any computer on the Internet with an SMTP or ESMTP listener. Mail is sent via smtplib Python using an SMTP server. He first initiates gmail SMTP using smtplib.SMTP (), then identifies the server using the ehlo () function, then encodes the starttls () session, then logs into his mailbox using login (), then sends a message using sendmail () .

 elif 'email' in command: sofiaResponse('Who is the recipient?') recipient = myCommand() if 'rajat' in recipient: sofiaResponse('What should I say to him?') content = myCommand() mail = smtplib.SMTP('smtp.gmail.com', 587) mail.ehlo() mail.starttls() mail.login('your_email_address', 'your_password') mail.sendmail('sender_email', 'receiver_email', content) mail.close() sofiaResponse('Email has been sent successfuly. You can check your inbox.') else: sofiaResponse('I don\'t know what you mean!') 

4. Run any system application.


Say "Open the calendar" or "You can, please, start Skype" or "Sofia, open the Finder", etc. And Sofia will launch for you this system application.

How it works
If you said the word “launch” in your command, it will search for the name of the application (if it is present on your system) in the user command, using re.search (). It will then add the “.app” suffix to the application name. Now your application is called, for example, calender.app (in macOS, executable files end with the .app extension, unlike Windows, which ends with .exe). Thus, the name of the executable application will be launched using the Pyen subprocess function Popen (). The subprocess module allows you to launch new applications from your Python program.

 elif 'launch' in command: reg_ex = re.search('launch (.*)', command) if reg_ex: appname = reg_ex.group(1) appname1 = appname+".app" subprocess.Popen(["open", "-n", "/Applications/" + appname1], stdout=subprocess.PIPE) sofiaResponse('I have launched the desired application') 

5. Report the current weather and temperature of any city


Sofia can also tell you the weather, the maximum and minimum temperature of any city in the world. The user simply has to say something like “what is the weather in London” or “tell me the current weather in Delhi”.

How it works
If you pronounced the current weather phrase in your command, it will search for the name of the city using re.search (). I used the pythons pyowm library to check the weather in any city. get_status () tells you about weather conditions such as haze, cloudiness, rain, etc., and get_tempera () tells you about the maximum and minimum temperature of a city.

 elif 'current weather' in command: reg_ex = re.search('current weather in (.*)', command) if reg_ex: city = reg_ex.group(1) owm = OWM(API_key='ab0d5e80e8dafb2cb81fa9e82431c1fa') obs = owm.weather_at_place(city) w = obs.get_weather() k = w.get_status() x = w.get_temperature(unit='celsius') sofiaResponse('Current weather in %s is %s. The maximum temperature is %0.2f and the minimum temperature is %0.2f degree celcius' % (city, k, x['temp_max'], x['temp_min'])) 

6. Report current time


“Sofia, can you tell me the current time?” Or “What time is it?”, And Sofia will tell you the current time of your time zone.

How it works
Pretty simple

 elif 'time' in command: import datetime now = datetime.datetime.now() sofiaResponse('Current time is %d hours %d minutes' % (now.hour, now.minute)) 

7. Greeting / Completion


Say "Hello, Sophia" to greet your voice assistant, or when you want the program to end, say something like "Complete the work, Sophia" or "Sophia, please complete the work", etc.

How it works
If you say the word “hello” in your team, then depending on the time of day, the bot will greet the user. If the time is more than 12 hours, the bot will respond “Hello. Good afternoon, ”and also, if the time is longer than 6 pm, the bot will respond,“ Hello. Good evening. "And when you give the command as shutdown, sys.exit () is called to end the program.

 #Greet Sofia elif 'hello' in command: day_time = int(strftime('%H')) if day_time < 12: sofiaResponse('Hello Sir. Good morning') elif 12 <= day_time < 18: sofiaResponse('Hello Sir. Good afternoon') else: sofiaResponse('Hello Sir. Good evening') #to terminate the program elif 'shutdown' in command: sofiaResponse('Bye bye Sir. Have a nice day') sys.exit() 

8. Playing a song on a VLC media player


This feature allows your voice bot to play the desired song in the VLC media player. The user will say "Sophia, play me a song", the bot will ask: "What song should I play?". Just say the name of the song, and Sofia will download the song from youtube to your local disk, play this song on the VLC media player, and if you play the song again, the previously loaded song will be automatically deleted.

How it works
If you said that this phrase plays me a song on your team, then it will ask you which video song to play. The song you ask for will be found on youtube.com. If the song found is downloaded to your local directory using the pytons library youtube_dl. Youtube-dl is a command line program for downloading videos from YouTube.com and several other sites. Now the song will be played as soon as it is loaded using the VLC pythons library, and the play module (path_to__videosong) actually plays the song.

Now, if next time you request any other song, the local directory will be reset, and a new song will be loaded into this directory.

 elif 'play me a song' in command: path = '/Users/nageshsinghchauhan/Documents/videos/' folder = path for the_file in os.listdir(folder): file_path = os.path.join(folder, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e) sofiaResponse('What song shall I play Sir?') mysong = myCommand() if mysong: flag = 0 url = "https://www.youtube.com/results?search_query=" + mysong.replace(' ', '+') response = urllib2.urlopen(url) html = response.read() soup1 = soup(html,"lxml") url_list = [] for vid in soup1.findAll(attrs={'class':'yt-uix-tile-link'}): if ('https://www.youtube.com' + vid['href']).startswith("https://www.youtube.com/watch?v="): flag = 1 final_url = 'https://www.youtube.com' + vid['href'] url_list.append(final_url) url = url_list[0] ydl_opts = {} os.chdir(path) with youtube_dl.YoutubeDL(ydl_opts) as ydl: ydl.download([url]) vlc.play(path) if flag == 0: sofiaResponse('I have not found anything in Youtube ') 

9. Change desktop wallpaper


You can also change your desktop wallpaper using this feature. When you say something like “Change wallpaper” or “Sofia, please change wallpaper”, the bot will download random wallpapers from unsplash.com and set them as the desktop background.

How it works
If you said that the phrase changes the wallpaper in your team, the program will download random wallpapers from unsplash.com, save them in a local directory and set them as a desktop wallpaper using subprocess.call (). I used the unsplash API to access its contents.

Now, if the next time you ask to change the wallpaper again, your local directory will be reset, and the new wallpaper will be loaded into that directory.

 elif 'change wallpaper' in command: folder = '/Users/nageshsinghchauhan/Documents/wallpaper/' for the_file in os.listdir(folder): file_path = os.path.join(folder, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e) api_key = 'fd66364c0ad9e0f8aabe54ec3cfbed0a947f3f4014ce3b841bf2ff6e20948795' url = 'https://api.unsplash.com/photos/random?client_id=' + api_key #pic from unspalsh.com f = urllib2.urlopen(url) json_string = f.read() f.close() parsed_json = json.loads(json_string) photo = parsed_json['urls']['full'] urllib.urlretrieve(photo, "/Users/nageshsinghchauhan/Documents/wallpaper/a") # Location where we download the image to. subprocess.call(["killall Dock"], shell=True) sofiaResponse('wallpaper changed successfully') 

10. Report the latest news from the news feed.


Sofia can also tell you the latest news. The user just has to say "Sofia, what is the most popular news today?" Or "Tell me the news for today."

How it works
If you have produced the phrase “news for today” on your team, then it will scrape the data using Beautiful Soup from Google News RSS () and read it for you. For convenience, I set a limit on the number of news to 15.

 elif 'news for today' in command: try: news_url="https://news.google.com/news/rss" Client=urlopen(news_url) xml_page=Client.read() Client.close() soup_page=soup(xml_page,"xml") news_list=soup_page.findAll("item") for news in news_list[:15]: sofiaResponse(news.title.text.encode('utf-8')) except Exception as e: print(e) 

11. Share virtually everything you ask.


Your bot can get detailed information about almost everything you ask. For example, "Sophia tell me about Google" or "Please tell me about supercomputers" or "Please tell me about the Internet." So, as you can see, you can ask about anything.

How it works
If you say the phrase “tell me” in your team, then it will search for a keyword in the user team using re.search (). Using the pythons wikipedia library, it will search for this topic and extract the first 500 characters (if you do not specify a limit, the bot will read the whole page for you). Wikipedia is a Python library that makes it easy to get and analyze data from Wikipedia.

 elif 'tell me about' in command: reg_ex = re.search('tell me about (.*)', command) try: if reg_ex: topic = reg_ex.group(1) ny = wikipedia.page(topic) sofiaResponse(ny.content[:500].encode('utf-8')) except Exception as e: sofiaResponse(e) 

Let's put it all together:

 import speech_recognition as sr import os import sys import re import webbrowser import smtplib import requests import subprocess from pyowm import OWM import youtube_dl import vlc import urllib import urllib2 import json from bs4 import BeautifulSoup as soup from urllib2 import urlopen import wikipedia import random from time import strftime def sofiaResponse(audio): "speaks audio passed as argument" print(audio) for line in audio.splitlines(): os.system("say " + audio) def myCommand(): "listens for commands" r = sr.Recognizer() with sr.Microphone() as source: print('Say something...') r.pause_threshold = 1 r.adjust_for_ambient_noise(source, duration=1) audio = r.listen(source) try: command = r.recognize_google(audio).lower() print('You said: ' + command + '\n') #loop back to continue to listen for commands if unrecognizable speech is received except sr.UnknownValueError: print('....') command = myCommand(); return command def assistant(command): "if statements for executing commands" #open subreddit Reddit if 'open reddit' in command: reg_ex = re.search('open reddit (.*)', command) url = 'https://www.reddit.com/' if reg_ex: subreddit = reg_ex.group(1) url = url + 'r/' + subreddit webbrowser.open(url) sofiaResponse('The Reddit content has been opened for you Sir.') elif 'shutdown' in command: sofiaResponse('Bye bye Sir. Have a nice day') sys.exit() #open website elif 'open' in command: reg_ex = re.search('open (.+)', command) if reg_ex: domain = reg_ex.group(1) print(domain) url = 'https://www.' + domain webbrowser.open(url) sofiaResponse('The website you have requested has been opened for you Sir.') else: pass #greetings elif 'hello' in command: day_time = int(strftime('%H')) if day_time < 12: sofiaResponse('Hello Sir. Good morning') elif 12 <= day_time < 18: sofiaResponse('Hello Sir. Good afternoon') else: sofiaResponse('Hello Sir. Good evening') elif 'help me' in command: sofiaResponse(""" You can use these commands and I'll help you out: 1. Open reddit subreddit : Opens the subreddit in default browser. 2. Open xyz.com : replace xyz with any website name 3. Send email/email : Follow up questions such as recipient name, content will be asked in order. 4. Current weather in {cityname} : Tells you the current condition and temperture 5. Hello 6. play me a video : Plays song in your VLC media player 7. change wallpaper : Change desktop wallpaper 8. news for today : reads top news of today 9. time : Current system time 10. top stories from google news (RSS feeds) 11. tell me about xyz : tells you about xyz """) #joke elif 'joke' in command: res = requests.get( 'https://icanhazdadjoke.com/', headers={"Accept":"application/json"}) if res.status_code == requests.codes.ok: sofiaResponse(str(res.json()['joke'])) else: sofiaResponse('oops!I ran out of jokes') #top stories from google news elif 'news for today' in command: try: news_url="https://news.google.com/news/rss" Client=urlopen(news_url) xml_page=Client.read() Client.close() soup_page=soup(xml_page,"xml") news_list=soup_page.findAll("item") for news in news_list[:15]: sofiaResponse(news.title.text.encode('utf-8')) except Exception as e: print(e) #current weather elif 'current weather' in command: reg_ex = re.search('current weather in (.*)', command) if reg_ex: city = reg_ex.group(1) owm = OWM(API_key='ab0d5e80e8dafb2cb81fa9e82431c1fa') obs = owm.weather_at_place(city) w = obs.get_weather() k = w.get_status() x = w.get_temperature(unit='celsius') sofiaResponse('Current weather in %s is %s. The maximum temperature is %0.2f and the minimum temperature is %0.2f degree celcius' % (city, k, x['temp_max'], x['temp_min'])) #time elif 'time' in command: import datetime now = datetime.datetime.now() sofiaResponse('Current time is %d hours %d minutes' % (now.hour, now.minute)) elif 'email' in command: sofiaResponse('Who is the recipient?') recipient = myCommand() if 'rajat' in recipient: sofiaResponse('What should I say to him?') content = myCommand() mail = smtplib.SMTP('smtp.gmail.com', 587) mail.ehlo() mail.starttls() mail.login('your_email_address', 'your_password') mail.sendmail('sender_email', 'receiver_email', content) mail.close() sofiaResponse('Email has been sent successfuly. You can check your inbox.') else: sofiaResponse('I don\'t know what you mean!') #launch any application elif 'launch' in command: reg_ex = re.search('launch (.*)', command) if reg_ex: appname = reg_ex.group(1) appname1 = appname+".app" subprocess.Popen(["open", "-n", "/Applications/" + appname1], stdout=subprocess.PIPE) sofiaResponse('I have launched the desired application') #play youtube song elif 'play me a song' in command: path = '/Users/nageshsinghchauhan/Documents/videos/' folder = path for the_file in os.listdir(folder): file_path = os.path.join(folder, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e) sofiaResponse('What song shall I play Sir?') mysong = myCommand() if mysong: flag = 0 url = "https://www.youtube.com/results?search_query=" + mysong.replace(' ', '+') response = urllib2.urlopen(url) html = response.read() soup1 = soup(html,"lxml") url_list = [] for vid in soup1.findAll(attrs={'class':'yt-uix-tile-link'}): if ('https://www.youtube.com' + vid['href']).startswith("https://www.youtube.com/watch?v="): flag = 1 final_url = 'https://www.youtube.com' + vid['href'] url_list.append(final_url) url = url_list[0] ydl_opts = {} os.chdir(path) with youtube_dl.YoutubeDL(ydl_opts) as ydl: ydl.download([url]) vlc.play(path) if flag == 0: sofiaResponse('I have not found anything in Youtube ') #change wallpaper elif 'change wallpaper' in command: folder = '/Users/nageshsinghchauhan/Documents/wallpaper/' for the_file in os.listdir(folder): file_path = os.path.join(folder, the_file) try: if os.path.isfile(file_path): os.unlink(file_path) except Exception as e: print(e) api_key = 'fd66364c0ad9e0f8aabe54ec3cfbed0a947f3f4014ce3b841bf2ff6e20948795' url = 'https://api.unsplash.com/photos/random?client_id=' + api_key #pic from unspalsh.com f = urllib2.urlopen(url) json_string = f.read() f.close() parsed_json = json.loads(json_string) photo = parsed_json['urls']['full'] urllib.urlretrieve(photo, "/Users/nageshsinghchauhan/Documents/wallpaper/a") # Location where we download the image to. subprocess.call(["killall Dock"], shell=True) sofiaResponse('wallpaper changed successfully') #askme anything elif 'tell me about' in command: reg_ex = re.search('tell me about (.*)', command) try: if reg_ex: topic = reg_ex.group(1) ny = wikipedia.page(topic) sofiaResponse(ny.content[:500].encode('utf-8')) except Exception as e: print(e) sofiaResponse(e) sofiaResponse('Hi User, I am Sofia and I am your personal voice assistant, Please give a command or say "help me" and I will tell you what all I can do for you.') #loop to continue executing multiple commands while True: assistant(myCommand()) 

So, you saw how, just by writing simple lines of Python code, we can create a very cool voice assistant for the computer. In addition to these features, you can also include many different features in your voice assistant.

Conclusion


What awaits us in the future, throughout the entire history of computing, user interfaces have become increasingly natural to use. The screen and keyboard were one step in that direction. The mouse and graphical user interface were different. Touch screens are the latest development. The next step is likely to consist of a mixture of augmented reality, gestures and voice commands. After all, it is often easier to ask a question or talk than to type something or enter a few details in an online form.

The more a person interacts with devices that are activated by voice, the more trends and patterns that the system identifies based on the information received. This data can then be used to define user preferences and tastes, which is a long-term advantage in order to make the house smarter. Google and Amazon seek to integrate voice artificial intelligence that can analyze and respond to human emotions.

Hope you enjoyed reading this article. Share your thoughts / comments / doubts in the comments section.

All knowledge!

Source: https://habr.com/ru/post/450224/


All Articles