
If your native language is not English and you are not writing applications only for the iPhone, then you will have to be hard enough if you want to find suitable tools for developing a so-called. mobile "voice-enabled" applications.
This review presents the classification and describes the most worthy of the kind of mobile TTS engine.
I do research on the design of mobile device interfaces for people with visual disabilities. To implement one of my projects, I needed a voice generation engine with multilingual support (at least two languages - English and Russian). This was the reason for the search for a speech synthesizer.
For convenience, we divide the TTS engines into three classes:
- commercial;
- free (licensed under the GPL, LGPL and more “soft” licenses such as BSD License or wxWindows License, which allow commercial product development);
- built-in (tools provided by the operating system itself).
Commercial engines
SVOX Mobile TTS
Price: n / a
Languages: 26, including Russian
Subjective assessment of sound quality: high
Mobile OS: Android, Symbian, Windows CE / Windows Mobile, BREW
Ability to develop commercial products: yes')
The SVOX company has the most “tasty” from a technical point of view product -
SVOX Mobile TTS . However, since the company operates mainly in the B2B segment, they did not respond to my two emails asking for the price.
Acapela TTS
Price: 2800 € plus the so-called run-time license, for which in the best case you will have to pay 49 € for each common application.
Languages: 23, including Russian
Subjective assessment of sound quality: high
Mobile OS: Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS
Ability to develop commercial products: yesAcapela Group employees turned out to be much more sociable and answered literally half an hour after filling
out this application.
The price indicated in the header refers to operating systems such as Windows Mobile and Symbian, but Acapela's business model differs depending on the OS selected. For example, they are most strongly promoting the direction of iOS, for which a
separate website has been made. There you can register and get an evaluation version of their engine for free. The price of the bare SDK for the former iPhone OS is 250 €. Also, with each application sold by you in the App Store, considerable interest is charged.
I note that Acapela provides
"cloud" speech synthesis , as well as porting the SDK to any platform.
Loquendo Embedded TTS
Price: 3000 € plus interest from each mobile application sold by you
Languages: 26, including Russian
Subjective assessment of sound quality: high
Mobile OS: Android, Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS, Maemo, Moblin, MeeGo, PalmOS
Ability to develop commercial products: yesLoquendo engine has special tags that allow you to make speech more natural, mixing such not quite "speech" chips like cough, laughter and so on.
Their engine meets
the SSML 1.0 specification recommended by the W3C.
Sakrament TTS
Price: 1500 € for one OS, when you buy a package for two languages at once, a discount of 25% is provided, which is 2250 €
Languages: English, Russian
Subjective assessment of sound quality: average
Mobile OS: Symbian, Windows Mobile
Ability to develop commercial products: yesThe quality of speech synthesis Sakrament TTS is enough to voice short phrases such as phone numbers or application names. A description of all SDK versions can be obtained
here .
Free engines
Flite
Price: No
Languages: English plus the ability to compile languages FestVox
Subjective assessment of sound quality: low
Mobile OS: Android, Windows CE / Windows Mobile, iOS, PalmOS
The possibility of developing commercial products: yes ( CMU license )In the world of desktop systems, the Festival speech synthesizer is well known. It has a port called
Flite for mobile devices and embedded systems, which is distributed under their own X11-like license, which allows free distribution of this software to anyone, as well as build on its basis both commercial and free applications. There are ports for
Windows CE / Windows Mobile , PalmOS,
Android and
iOS .
eSpeak
Price: No
Languages: 39, including Russian
Subjective assessment of sound quality: average
Mobile OS: Android, Windows CE / Windows Mobile
Ability to develop commercial products: none ( GNU GPL )Instructions for compiling the engine for WM is included in the distribution, but on this platform, eSpeak has one significant limitation - voice generation is possible only in a WAV file. A compiled TTS engine for Windows Mobile is available
here .
eSpeak is ported to
Android . The easiest way to try it is to install the TTS Service Extended application from the Android Market, which allows you to switch between the built-in engine and eSpeak. This TTS engine is distributed under the terms of the GNU GPL.
Embedded Solutions
Embedded solutions are present only in Symbian and Android. For some unknown reason, Microsoft has deprived its mobile OS of the corresponding software interface (MS SAPI).
Symbian
Price: No
Languages: English
Subjective assessment of sound quality: extremely low
Ability to develop commercial products: yesThe built-in TTS from the Symbian Foundation is hiding in the CMdaAudioPlayerUtility class. Although his
documentation does not say anything about it, it still
allows you
to synthesize speech. Unfortunately, the Russian language is not supported. The quality of the generation of English speech is very low. Without preparation it is quite difficult to understand exactly what he said.
Additional language packs can be downloaded
here , but the list of supported phones is extremely small. The installation of packages for the Russian language on a device running Symbain OS S60 5th did not yield the expected results, the built-in TTS never spoke in Russian.
I note that there is a fairly convenient extension API called NSS TTS Utility API, the description of which can be found
here .
Android

Price: No
Languages: English, French, German, Italian, Spanish
Subjective assessment of sound quality:
averageAbility to develop commercial products: yes
The built-in speech synthesis functionality in Android is available from version 1.6. A great
introduction to the topic can be found on the developers blog. The Android TTS API is nothing more than a wrapper over SVOX Pico, the Russian language, which, unfortunately, is not supported.
Conclusion
Conclusions everyone will have to do depending on the requirements for the product being developed. For commercial decisions, the quality of speech synthesis is extremely important, so it is worth choosing from two engines - Acapela TTS and Loquendo Embedded TTS. When choosing an engine for an open source project, the list of target operating systems will play a crucial role.
For myself, I chose eSpeak, since my project is academic and I can afford to use the product licensed under the GNU GPL.