📜 ⬆️ ⬇️

Review of mobile text-to-speech engines

image If your native language is not English and you are not writing applications only for the iPhone, then you will have to be hard enough if you want to find suitable tools for developing a so-called. mobile "voice-enabled" applications.

This review presents the classification and describes the most worthy of the kind of mobile TTS engine.

I do research on the design of mobile device interfaces for people with visual disabilities. To implement one of my projects, I needed a voice generation engine with multilingual support (at least two languages ​​- English and Russian). This was the reason for the search for a speech synthesizer.

For convenience, we divide the TTS engines into three classes:

Commercial engines


SVOX Mobile TTS
SVOX logo
Price: n / a
Languages: 26, including Russian
Subjective assessment of sound quality: high
Mobile OS: Android, Symbian, Windows CE / Windows Mobile, BREW
Ability to develop commercial products: yes
')
The SVOX company has the most “tasty” from a technical point of view product - SVOX Mobile TTS . However, since the company operates mainly in the B2B segment, they did not respond to my two emails asking for the price.

Acapela TTS
Acapela logo
Price: 2800 € plus the so-called run-time license, for which in the best case you will have to pay 49 € for each common application.
Languages: 23, including Russian
Subjective assessment of sound quality: high
Mobile OS: Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS
Ability to develop commercial products: yes

Acapela Group employees turned out to be much more sociable and answered literally half an hour after filling out this application.

The price indicated in the header refers to operating systems such as Windows Mobile and Symbian, but Acapela's business model differs depending on the OS selected. For example, they are most strongly promoting the direction of iOS, for which a separate website has been made. There you can register and get an evaluation version of their engine for free. The price of the bare SDK for the former iPhone OS is 250 €. Also, with each application sold by you in the App Store, considerable interest is charged.

I note that Acapela provides "cloud" speech synthesis , as well as porting the SDK to any platform.

Loquendo Embedded TTS
Loquendo logo
Price: 3000 € plus interest from each mobile application sold by you
Languages: 26, including Russian
Subjective assessment of sound quality: high
Mobile OS: Android, Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS, Maemo, Moblin, MeeGo, PalmOS
Ability to develop commercial products: yes

Loquendo engine has special tags that allow you to make speech more natural, mixing such not quite "speech" chips like cough, laughter and so on.

Their engine meets the SSML 1.0 specification recommended by the W3C.

Sakrament TTS
Sakrament logo
Price: 1500 € for one OS, when you buy a package for two languages ​​at once, a discount of 25% is provided, which is 2250 €
Languages: English, Russian
Subjective assessment of sound quality: average
Mobile OS: Symbian, Windows Mobile
Ability to develop commercial products: yes

The quality of speech synthesis Sakrament TTS is enough to voice short phrases such as phone numbers or application names. A description of all SDK versions can be obtained here .

Free engines


Flite
Price: No
Languages: English plus the ability to compile languages FestVox
Subjective assessment of sound quality: low
Mobile OS: Android, Windows CE / Windows Mobile, iOS, PalmOS
The possibility of developing commercial products: yes ( CMU license )

In the world of desktop systems, the Festival speech synthesizer is well known. It has a port called Flite for mobile devices and embedded systems, which is distributed under their own X11-like license, which allows free distribution of this software to anyone, as well as build on its basis both commercial and free applications. There are ports for Windows CE / Windows Mobile , PalmOS, Android and iOS .

eSpeak
eSpeak logo
Price: No
Languages: 39, including Russian
Subjective assessment of sound quality: average
Mobile OS: Android, Windows CE / Windows Mobile
Ability to develop commercial products: none ( GNU GPL )

Instructions for compiling the engine for WM is included in the distribution, but on this platform, eSpeak has one significant limitation - voice generation is possible only in a WAV file. A compiled TTS engine for Windows Mobile is available here .

eSpeak is ported to Android . The easiest way to try it is to install the TTS Service Extended application from the Android Market, which allows you to switch between the built-in engine and eSpeak. This TTS engine is distributed under the terms of the GNU GPL.

Embedded Solutions


Embedded solutions are present only in Symbian and Android. For some unknown reason, Microsoft has deprived its mobile OS of the corresponding software interface (MS SAPI).

Symbian
Symbian logo
Price: No
Languages: English
Subjective assessment of sound quality: extremely low
Ability to develop commercial products: yes

The built-in TTS from the Symbian Foundation is hiding in the CMdaAudioPlayerUtility class. Although his documentation does not say anything about it, it still allows you to synthesize speech. Unfortunately, the Russian language is not supported. The quality of the generation of English speech is very low. Without preparation it is quite difficult to understand exactly what he said.

Additional language packs can be downloaded here , but the list of supported phones is extremely small. The installation of packages for the Russian language on a device running Symbain OS S60 5th did not yield the expected results, the built-in TTS never spoke in Russian.

I note that there is a fairly convenient extension API called NSS TTS Utility API, the description of which can be found here .

Android
Android robot
Price: No
Languages: English, French, German, Italian, Spanish
Subjective assessment of sound quality: average
Ability to develop commercial products: yes

The built-in speech synthesis functionality in Android is available from version 1.6. A great introduction to the topic can be found on the developers blog. The Android TTS API is nothing more than a wrapper over SVOX Pico, the Russian language, which, unfortunately, is not supported.

Conclusion


Conclusions everyone will have to do depending on the requirements for the product being developed. For commercial decisions, the quality of speech synthesis is extremely important, so it is worth choosing from two engines - Acapela TTS and Loquendo Embedded TTS. When choosing an engine for an open source project, the list of target operating systems will play a crucial role.

For myself, I chose eSpeak, since my project is academic and I can afford to use the product licensed under the GNU GPL.

Source: https://habr.com/ru/post/102199/


All Articles