📜 ⬆️ ⬇️

Voice control web-player, or cross CMU Sphinx with Selenium WebDriver

In this article I described the creation of a web mp3 player and a home audio system.
The player itself can be seen here .

There was an idea - to fasten a voice control to the player.
After an hour or two of searching on the Internet, the solution was found:
CMU Sphinx - for speech recognition + Selenium WebDriver - for programmatically controlling the browser.

So, let's begin.

Development was conducted in IDE Eclipse.
First you need to convert our project to the Maven project:
right click on the project - Configure - Convert to Maven Project.
')
Add the following to the pom.xml file:

<repositories> <repository> <id>snapshots-repo</id> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases><enabled>false</enabled></releases> <snapshots><enabled>true</enabled></snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>edu.cmu.sphinx</groupId> <artifactId>sphinx4-core</artifactId> <version>1.0-SNAPSHOT</version> </dependency> </dependencies> 


We also need:

1. Russian acoustic model (download here )
- Download the latest version of the archive, and copy the zero_ru.cd_cont_4000 folder to our source folder.

2. Selenium WebDriver for Java ( download ) - we connect the jar-file of the library from the archive to the project.

3. These files are used to generate transcriptions of Russian words using dict2transcript.pl.

And so, you can start working on the program.

Using the script dict2transcript.pl we compile our dictionary - mydict.dict:

 cranberries kryn bb i rr is gromche gr oo m ch i iskat isk aa tt kinoproby kk inapr oo by minus mm ii nus nautilus n ay u tt ii lus nazad n ay z aa t number1 a dd ii n number3 t rr ii number10 dd je ss i tt number30 t rr ii c ay tt pausa p aa uz ay plus p ll ju s rammstein r aa m sh t ay jn snaipery sn aa j pp iry start st ay rt tishe tt ii sh y 


Then we compile the grammar file - mygrammar.gram:

 #JSGF V1.0; grammar mygrammar; public <start> = start; public <pausa> = pausa; public <tishe> = tishe; public <gromche> = gromche; public <switch> = (plus|minus)(number1|number3|number10|number30); public <find> = iskat; public <changeartist> = <find>(cranberries|kinoproby|nautilus|rammstein|snaipery); 


And here, actually, the source code of the Java program:

 package jatx.sphinxtest; import org.openqa.selenium.By; import org.openqa.selenium.JavascriptExecutor; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.firefox.FirefoxDriver; import edu.cmu.sphinx.api.Configuration; import edu.cmu.sphinx.api.LiveSpeechRecognizer; import edu.cmu.sphinx.api.SpeechResult; import edu.cmu.sphinx.result.WordResult; public class Main { private static final WebDriver driver = new FirefoxDriver(); private static int volume = 100; private static final String[] prefixes = {"plus","minus","iskat"}; private static final String[] artists = {"cranberries", "kinoproby","nautilus","rammstein","snaipery"}; private static final String[] numbers = {"number1","number3","number10","number30"}; static { driver.manage().window().maximize(); music(); } public static void main(String[] args) { Configuration configuration = new Configuration(); configuration.setAcousticModelPath("resource:/jatx/sphinxtest/zero_ru.cd_cont_4000"); configuration.setDictionaryPath("resource:/jatx/sphinxtest/mydict.dict"); configuration.setUseGrammar(true); configuration.setGrammarPath("resource:/jatx/sphinxtest"); configuration.setGrammarName("mygrammar"); try { LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration); recognizer.startRecognition(true); String prefix = ""; while (true) { SpeechResult result = recognizer.getResult(); for (WordResult r : result.getWords()) { try { String cmd = r.getWord().toString(); if (cmd.equals("tishe")) volumeDown(); if (cmd.equals("gromche")) volumeUp(); if (cmd.equals("start")) play(); if (cmd.equals("pausa")) pause(); for (String pref: prefixes) { if (cmd.equals(pref)) prefix = pref; } for (String artist: artists) { if (cmd.equals(artist)&&prefix.equals("iskat")) changeArtist(artist); } for (String number: numbers) { if (cmd.equals(number)) { Integer num = Integer.parseInt(number.replace("number", "")); if (prefix.equals("minus")) rev(num); if (prefix.equals("plus")) fwd(num); } } //System.out.println(cmd); } catch (Exception e) { e.printStackTrace(); } } } } catch (Exception e) { e.printStackTrace(); } } private static void music() { System.out.println("music"); driver.get("http://home.tabatsky.ru/mp3player/desktop.jsp"); } private static void volumeDown() { System.out.println("volume down"); volume = (volume>10?volume-20:volume); setVolume(volume); } private static void volumeUp() { System.out.println("volume up"); volume = (volume<90?volume+20:volume); setVolume(volume); } private static void setVolume(int volume) { JavascriptExecutor js = (JavascriptExecutor) driver; js.executeScript("$('#volume_slider').slider('value'," + Integer.valueOf(volume) + ")"); js.executeScript("window.setVolume("+Double.valueOf(volume/100.0)+")"); } private static void play() { System.out.println("start"); WebElement track = driver.findElement(By.id("0")); track.click(); } private static void pause() { System.out.println("pause"); JavascriptExecutor js = (JavascriptExecutor) driver; js.executeScript("$('#toogle').trigger('click')"); } private static void fwd(int n) { System.out.println("plus " + Integer.valueOf(n).toString()); WebElement fwd = driver.findElement(By.id("fwd")); for (int i=0; i<n; i++) { fwd.click(); } } private static void rev(int n) { System.out.println("minus " + Integer.valueOf(n).toString()); WebElement rev = driver.findElement(By.id("rev")); for (int i=0; i<n; i++) { rev.click(); } } private static void changeArtist(String artist) { System.out.println("Changing artist: "+artist); String query = ""; switch (artist) { case "cranberries": query = "cranberries"; break; case "kinoproby": query = ""; break; case "nautilus": query = "nautilus |   | mutatis mutandis"; break; case "rammstein": query = "rammstein made in germany | rammstein herzeleid | rammstein mtv music history"; break; case "snaipery": query = "  |  "; break; } JavascriptExecutor js = (JavascriptExecutor) driver; js.executeScript("$('#query').val('"+query+"')"); js.executeScript("$('#search').trigger('click')"); } } 


According to the results of testing: the sphinx from time to time confuses commands, or takes outside noise for teams.

Source: https://habr.com/ru/post/239607/


All Articles