Speech synthesis
You can synthesize speech to your interlocutor before and during an active call or conference.
Contents
Speech synthesis usage
To synthesize speech during an incoming call, first answer a call. You can use the startEarlyMedia method to broadcast speech before the call is answered to create a greeting or a voicemail prompt.
To synthesize speech, use the createTTSPlayer() method in your scenario. Pass the text string to synthesize in the first parameter and options in the second parameter. See the code example below to understand how it works:
Configuring the voice
You can choose the voice for speech synthesis from one of these lists: VoiceList.Amazon, VoiceList.Google, VoiceList.Tinkoff, VoiceList.Yandex. The default voice is VoiceList.Amazon.en_US_Joanna.
If you have a custom Yandex engine voice, please contact support to activate this feature for your account, then specify the voice folder ID to the yandexCustomModelName property.
You can also configure other speech synthesis options, such as pitch, rate, and volume. To specify them, list them in the ttsOptions parameter of the createTTSPlayer() method. See the code example below:
The options have the following values:
pitch (voice pitch) with the following acceptable ranges: 1) the numbers followed by "Hz" from 0.5Hz to 2Hz 2) x-low, low, medium, high, x-high, default
rate (speech speed) with the following possible values: x-slow, slow, medium, fast, x-fast, default
volume (speech volume) with the possible values: silent, x-soft, soft, medium, loud, x-loud, default
If you want to set them for the whole text, you do not have to use the speak
tag. If you want to use specific attributes for a part of the text, specify the speak
tag manually.
The supported tag list depends on the language provider. You can find these lists on their official websites. If you use a not-supported tag, the PlaybackFinished event is triggered with the 400 error.
For example, if we choose Amazon, we have to use the prosody
tag to control volume, rate, or pitch of the selected text fragment. Here is how we make this fragment sound higher:
Here's another example of a specific Amazon's tag say-as:
Passing parameters directly to the provider
There are two ways of passing speech synthesis parameters to your provider. You can fill the ttsOptions parameters on the Voximplant side, as it is explained in this article, so the platform converts them to the provider's format and sends them to your provider. Alternatively, you can provide the parameters directly to the provider in the request parameter.
You need to specify the parameters in the specific format that your provider accepts. Different providers use different formats. Refer to your provider's API reference to learn about the formats.
To pass the parameters directly, choose the provider is the language parameter of the createTTSPlayer, method and pass the request parameter using the provider's format. Here is the full scenario example of how to use the request parameter with Google:
Here are examples of the request parameter for the most common providers:
3rd-party speech synthesis providers' documentation links
In addition to the examples above, you can read the 3rd-party speech synthesis providers documentation to understand how to build the request parameter in your scenario:
Alternatively, you can use the Media player to integrate 3rd-party voice providers, such as OpenAI TTS.