Media players
This article will explain all approaches to playing a synthesized speech or a pre-recorded media file during a call or a conference.
Contents
Single media player
VoxEngine provides the Player module which allows you to play media files and text-to-speech blocks.
To use Player during an incoming call, first answer a call. You can use the startEarlyMedia method to broadcast speech before the call is answered to create a greeting or a voicemail prompt.
Use the createTTSPlayer() function to create a text-to-speech player. It accepts the same arguments as the speech synthesis say() method. Refer to the speech synthesis article to learn about this method usage.
Use the createURLPlayer method to play media files. It accepts two arguments: the first is the media file URL string, the second argument is URLPlayerOptions. Follow this link to learn about player options.
After you create a TTS or an URL player, you need to broadcast it to an active call via the sendMediaTo method.
Refer to the code example below to understand how the Player module works:
You can use the URL player to integrate 3rd-party TTS voices, such as OpenAI TTS, and use them in your scenario.
Please note, that the maximum number of media players per JS session (one scenario) is 10.
Sequence player
TTS player and URL player are suitable when you have one or two tracks that you want to play either independently or in sequence.
However, if you want to play two tracks sequentially, for example, speech synthesis (TTS player), and a pre-recorded media file (URL player), you need to independently process all the events of the first player and manage the second player (subscribe to the events of the first player, process it correctly, then launch the second player, etc.).
But what if we need to play more than two tracks, for example, 5 tracks? In this case, SequencePlayer comes to an aid. It accepts so-called “segments” as input (essentially settings for each player), and then it creates all the players sequentially and independently processes and manages their events.
The segments can be either text synthesis or media files. You can specify parameters for each segment specifically, e.g. you can choose different voices for each part of the player. Take a look at the code example:
Sequence player has all the necessary tools for configuration and event management. Learn more about the sequence player's API in the API reference section.