📜 ⬆️ ⬇️

Free speech recognition from the Russian company Stel

When it becomes necessary to turn an audio file with speech into text, Google and Yandex’s solutions first come to mind. But, besides Yandex, there is another domestic company - “Stel” ( http://speech.stel.ru/ ), whose API supports “over 9000” and even “very many” requests per day, and gives Stel trial keys is free.

image


')
Dealing with the API is not so difficult, but at the time of this writing, the manual from the Stela website is outdated and does not work, therefore, a manual with examples in Python and Java will be presented here. The Java example is especially relevant if you have an audio file not as a file, but as an array of bytes. Immediately it should be noted that Stel works only with wav files with a sampling frequency of 8 kHz, a sample size of 16 bits, mono (one channel).

Closer to the point: on the Stale website ( http://speech.stel.ru/api_description ) it is described in detail what and how (even though at the moment it is a bit outdated), therefore, we give an immediately working (again, at the moment) example on python:

coding: utf-8 import httplib, json, base64 HOST = 'api.stel.ru:7071' APIKEY = '***' # Place your API key here MODEL = 'rus_gsm_ext' WAV = base64.b64encode(open('test.wav', 'rb').read()) # demo audio file (WAV, 8000 HZ, 16-bit, mono) con = httplib.HTTPConnection(HOST) #Speech recognition data = json.dumps({'apikey' : APIKEY, 'model': MODEL , 'wav' : WAV}) headers = {'Content-Type' : 'application/json', 'Accept': 'application/json', 'Content-Length' : '{0}'.format(len(data))} con.request('POST', '/kwfind', data, headers) resp = con.getresponse() if resp.status == 200: print json.loads(resp.read()) # UTF-8 string with recognized text else: print resp.reason 


As you can see, test.wav is sent to the recognition from the working directory of the script. Similar Java code, as well as code that works with byte arrays, are shown below. First, a class that turns arrays of bytes (without markup) and files into arrays of bytes corresponding to a wav file in the specified format (we will need 8000 hertz, 2 bytes, 1 channel):

 package ru.habrahabr.stel.example; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.util.Arrays; import javax.sound.sampled.AudioFileFormat; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioInputStream; import javax.sound.sampled.AudioSystem; import javax.sound.sampled.UnsupportedAudioFileException; public class WaveFile { private int INT_SIZE = 4; public final int NOT_SPECIFIED = -1; private int sampleSize = NOT_SPECIFIED; private long framesCount = NOT_SPECIFIED; private byte[] data = null; private AudioInputStream ais = null; private AudioFormat af = null; WaveFile(File file) throws UnsupportedAudioFileException, IOException { if(!file.exists()) { throw new FileNotFoundException(file.getAbsolutePath()); } ais = AudioSystem.getAudioInputStream(file); af = ais.getFormat(); framesCount = ais.getFrameLength(); sampleSize = af.getSampleSizeInBits()/8; long dataLength = framesCount*af.getSampleSizeInBits()*af.getChannels()/8; data = new byte[(int) dataLength]; ais.read(data); } WaveFile(int sampleSize, float sampleRate, int channels, int[] samples) throws Exception { if(sampleSize < INT_SIZE) { throw new Exception("sample size < int size"); } this.sampleSize = sampleSize; this.af = new AudioFormat(sampleRate, sampleSize*8, channels, true, false); this.data = new byte[samples.length*sampleSize]; for(int i=0; i < samples.length; i++) { setSampleInt(i, samples[i]); } framesCount = data.length / (sampleSize*af.getChannels()); ais = new AudioInputStream(new ByteArrayInputStream(data), af, framesCount); } WaveFile(int sampleSize, float sampleRate, int channels, byte[] wave) throws Exception { this.sampleSize = sampleSize; this.af = new AudioFormat(sampleRate, sampleSize*8, channels, true, false); this.data = Arrays.copyOf(wave, wave.length); framesCount = data.length / (sampleSize*af.getChannels()); ais = new AudioInputStream(new ByteArrayInputStream(data), af, framesCount); } public AudioFormat getAudioFormat() { return af; } public byte[] getData() { return Arrays.copyOf(data, data.length); } public byte[] getWave() throws Exception { ByteArrayOutputStream bts = new ByteArrayOutputStream(); AudioSystem.write(new AudioInputStream(new ByteArrayInputStream(data), af, framesCount), AudioFileFormat.Type.WAVE, bts); return bts.toByteArray(); } public int getSampleSize() { return sampleSize; } public double getDurationTime() { return getFramesCount() / getAudioFormat().getFrameRate(); } public long getFramesCount() { return framesCount; } public void saveFile(File file) throws IOException { AudioSystem.write( new AudioInputStream(new ByteArrayInputStream(data), af, framesCount), AudioFileFormat.Type.WAVE, file); } public int getSampleInt(int sampleNumber) { if(sampleNumber < 0 || sampleNumber >= data.length/sampleSize) { throw new IllegalArgumentException( "sample number is can't be < 0 or >= data.length/" + sampleSize); } byte[] sampleBytes = new byte[sampleSize]; for(int i=0; i < sampleSize; i++) { sampleBytes[i] = data[sampleNumber * sampleSize + i]; } int sample = ByteBuffer.wrap(sampleBytes) .order(ByteOrder.LITTLE_ENDIAN).getInt(); return sample; } public void setSampleInt(int sampleNumber, int sampleValue) { byte[] sampleBytes = ByteBuffer.allocate(sampleSize). order(ByteOrder.LITTLE_ENDIAN).putInt(sampleValue).array(); for(int i=0; i < sampleSize; i++) { data[sampleNumber * sampleSize + i] = sampleBytes[i]; } } } 


It should be noted that this is a slightly added class taken from http://blog.eqlbin.ru/2011/02/wave-java.html . In general, all that is added here is the getWave () function, which returns an array of bytes corresponding to the file constructed by one of the constructors. As well as a constructor that takes an array of bytes of a regular raw file. Send to Stell will be exactly the result of the getWave () function. The following is a function that accepts WaveFile , opens a connection to the Steel, sends everything you need, closes the connection, and returns the recognized string:

 String getResponseOn(WaveFile wf) { String res = new String(); try { byte[] wav = wf.getWave(); HttpConnection conn = new HttpConnection("api.stel.ru", 7071); conn.open(); HttpState state = new HttpState(); PostMethod post = new PostMethod(); JSONObject data = new JSONObject(); data.put("apikey", "***"); // Place your API key here data.put("model", "rus_gsm_ext"); data.put("wav", new String(Base64.encodeBase64(wav))); post.setPath("/kwfind"); post.setRequestHeader("Content-Type", "application/json"); post.setRequestHeader("Accept", "application/json"); post.setRequestHeader("Content-Length", ""+data.toJSONString().length()); post.setRequestEntity(new StringRequestEntity(data.toJSONString(), "application/json", null)); post.execute(state, conn); res = res + (String) ((JSONObject) new JSONParser().parse(post.getResponseBodyAsString())).get("text"); conn.close(); } catch(Exception e) { res = null; } return res; } 


Do not forget that you need to replace "***" with your key, and also that WaveFile for getResponseOn is created with parameters (2, (float) 8000, 1, (byte []) raw) , for example:

 String res1 = getResponseOn(new WaveFile(2, (float) 8000.0, 1, sound)); String res2 = getResponseOn(new WaveFile(new File("test.waw"))); //demo audio file (WAV, 8000 HZ, 16-bit, mono) 


In addition, it should be noted that getResponseOn (WaveFile wf) uses org.json.simple.JSONObject and org.json.simple.parser.JSONParser , which often must be downloaded separately, for example, from here: www.java2s.com/Code/ Jar / j / Downloadjsonsimple111jar.htm

STEL is easy to contact, so if you need other languages ​​or language bases, you can agree with them.

Recall that our team is developing an intelligent home assistant Lexi. Lexi is a desktop device with artificial intelligence and a full voice interface for controlling a smart home. The device can receive information on the Internet, manage household appliances, report news from social networks. By the way, you can read about interesting thoughts about the future of such home robots in this article .

Speech recognition technology, as you might have guessed, is from us from Stan. In this case, speech recognition occurs entirely on board the device (a review of our own electronics can be found here ) This gives us a number of advantages in comparison with competitive analogues, for example, an increase in the speed at which the user gives an answer, the absence of an activation phrase, and the ability to work without the Internet.

Follow on our project in social networks: Vkontakte and Facebook .

image
Thanks for attention.

Source: https://habr.com/ru/post/260683/


All Articles