Today, speech synthesizers used in stationary computer systems or mobile devices do not seem unusual anymore. Technology has stepped far forward and made it possible to reproduce a human voice. How it all works, where it is applied, what is the best speech synthesizer and what potential problems a user may encounter, see below.
What are speech synthesizers and where are they used?
Speech synthesizers are special programs consisting of several modules that allow you to translate the text typed on the keyboard into normal human speech in the form of sound.
It would be naive to believe that the accompanying libraries contain absolutely all the words or possible phrases recorded in the studios by real people. It is simply physically impossible. In addition, the phrase libraries would be so large that it would simply not be possible to install them even on modern large-capacity hard drives, not to mention mobile devices.
For this, a technology called Text-to-Speech (text-to-speech) was developed.
Speech synthesizers are most widely used in several areas, which include independent study of foreign languages ββ(programs often have support in 50 languages ββor more), you need to hear the correct pronunciation of a word, listen to book texts instead of reading, create speech and vocal parts in music , their use by people with disabilities, the issuance of search queries in the form of voiced words and phrases, etc.
Varieties of programs
Depending on the application, all programs can be divided into two main types: standard, directly converting text to speech, and speech or vocal modules used in music applications.
For a more complete understanding of the picture, we consider both classes, but more emphasis will nevertheless be placed on speech synthesizers in their immediate purpose.
Pros and cons of the simplest voice applications
As for the advantages and disadvantages of programs of this type, we first consider the disadvantages.
First of all, you need to clearly understand that a computer - it is a computer, which at this stage of development, human speech can be synthesized very approximately. In the simplest programs, there are often problems with accentuation in words, reduced sound quality, and in mobile devices - increased power consumption, and sometimes unauthorized loading of speech modules.
But there are enough advantages, because very many sound information is perceived much better than visual. Ease of perception is obvious.
How to use a speech synthesizer?
Now a few words about the basic principles of using programs of this type. You can install a speech synthesizer of any type without any special problems. In stationary systems, a standard installer is used, where the main task will be the choice of supported language modules. For mobile devices, the installation file can be downloaded from the official store or repository like Google Play or AppStore, after which the application is installed automatically.
As a rule, at the first start, no settings are required, except for setting the default language. True, sometimes a program may offer to choose the sound quality (in the standard version used everywhere, the sampling frequency is 4410 Hz, the depth is 16 bits and the bitrate is 128 kbit / s). On mobile devices, these figures are lower. Nevertheless, a certain voice is taken as the basis. Using a standard pronunciation pattern by applying filters and equalizers, the sound of just such a timbre is achieved.
In use, you can choose several options for translating text: manually entering text , scoring already existing text from a file, integration into other applications (for example, web browsers) with the activation of the search results or reading text content on online pages. It is enough to choose the desired option, language and voice, with which all this will be pronounced. Many programs have several varieties of voices: both male and female. To activate the playback process, the start button is usually used.
If we talk about how to turn off the speech synthesizer, there may be several options. In the simplest case, the button to stop playback in the program itself is used. In case of integration into the browser, deactivation is performed in the extension settings or by complete removal of the plugin. But with mobile devices, despite the direct shutdown, there may be problems, which will be discussed separately.
In music programs, setting up and entering text is much more complicated. For example, the FL Studio application has its own speech module, in which you can select several types of voices, change the key settings, playback speed, etc. To emphasize the syllable, use the "_" symbol. But such a synthesizer is only suitable for creating robotic voices.
But Yamaha's Vocaloid package is for professional type software. Text-to-Speech technology is implemented in full. In the settings, in addition to the standard parameters, you can set articulation, glissando, use libraries with vocals of professional performers, compose words and phrases, fitting them to notes, and a bunch more. It is not surprising that a package with only one vocal takes about 4 GB or more in the installation distribution, and after unpacking - twice or three times as much.
Synthesizers of speech with Russian voices: a brief overview of the most popular
But back to the simplest applications and consider the most popular of them.
RHVoice - according to most experts, the best speech synthesizer, which is the Russian authorship of Olga Yakovleva. In the standard version, three voices are available (Alexander, Irina, Elena). The settings are simple. And the application itself can be used both as a standalone program compatible with SAPI5 and as a screen module.
Acapela is a rather interesting application, the main feature of which is the almost perfect voice acting in more than 30 languages ββof the world. In the regular version, however, only one vote is available (Alena).
Vocalizer is a powerful Milena female voice application. Very often this program is used in call centers. There are many settings for setting stress, volume, reading speed and installing additional dictionaries. The main difference is that the speech engine can be embedded in programs like Cool Reader, Moon + Reader Pro or Full Screen Caller ID.
Festival is a powerful speech synthesis and recognition utility designed for Linux and Mac OS X. The application comes with open source code and, in addition to standard language packs, supports even Finnish and Hindi.
eSpeak is a voice application supporting over 50 languages. The main drawback is the preservation of files with synthesized speech exclusively in the WAV format, which takes up a lot of space. But the program is cross-platform and can be used even in mobile systems.
Speech synthesizer issues on Google Android
When installing a βnativeβ speech synthesizer from Google, users constantly complain that it spontaneously starts loading additional language modules, which can not only take a sufficiently long period of time, but also consumes traffic.
You can get rid of this on Android systems very simply. To do this, use the settings menu, then go to the language and voice input section, select voice search and click on the cross on the speech recognition parameter offline (disconnect). It is also recommended that you clear the application cache and reboot the device. Sometimes it may be necessary to turn off the display of notifications in the application itself.
What is the result?
To summarize, we can say that in most cases the simplest programs are suitable for ordinary users. In all ratings, RHVoice leads. But for musicians who want to achieve a natural sounding of the voice so that the difference between live vocals and computer synthesis is not felt by ear, it is better to give preference to programs like Vocaloid, especially since many additional voice libraries are released for them, and the settings have so many possibilities that primitive applications, as they say, were not around.