Realtime API client

In the rapidly evolving landscape of AI-driven applications, real-time communication and interactivity have become pivotal in delivering enhanced user experiences. Voximplant's OpenAI realtime API client bridges the gap between cutting-edge AI capabilities and real-time communication, empowering developers to seamlessly integrate OpenAI's advanced models into voice applications.

This guide serves as a comprehensive resource for developers, providing step-by-step instructions, practical use cases, and best practices for leveraging this powerful API client. Whether you are building conversational AI, automating customer interactions, or crafting innovative real-time solutions, this guide helps you unlock the full potential of Voximplant’s integration with OpenAI, enabling smarter, more engaging, and responsive applications. Any Voximplant call can be connected to an OpenAI agent in a VoxEngine scenario; for that purpose, we have introduced the RealtimeAPIClient class. Your OpenAI API KEY is required to create its instance:

createRealtimeAPIClient({apiKey: "PUT_YOUR_OPENAI_API_KEY_HERE", model: "gpt-4o-realtime-preview", onWebSocketClose: (event) => {}});

You can choose the model OpenAI Realtime API processing in the optional model parameter. At the moment, there are two model options: "gpt-4o-realtime-preview" and "gpt-4o-mini-realtime-preview". The mini version is much cheaper than the full version. Refer to the OpenAI pricing article for more information. The cost of the Voximplant's client does not depend on the chosen model.

The third parameter is a callback function that is called if the WebSocket connection to the OpenAI Realtime API endpoint is closed for some reason.

Connecting incoming calls to the OpenAI realtime API

The basic version of a VoxEngine scenario that connects an incoming call to an OpenAI agent looks as follows:

Connecting an incoming call to the OpenAI realtime API

An outgoing call can be connected to an OpenAI agent in the same way, just the call itself is created in the scenario via VoxEngine.callPSTN, VoxEngine.callSIP, or VoxEngine.callUser function.

Using a 3rd party TTS with the OpenAI realtime API

In case audio generated by the OpenAI model does not work for your use case (for example, you do not like any of the available voices), there is an easy way to disable audio generation on the OpenAI side and use one of many Voximplant TTS integrations. We have created a code example that shows how to use ElevenLabs for speech synthesis together with the realtime API Client that generates text responses.

Using a 3rd party speech synthesis with OpenAI realtime API

Function calling

The great benefit of Voximplant’s serverless architecture is that you can do function calling right in the VoxEngine scenario and return parameters back to the model right there too. That simplifies and speeds up the development process, what most developers would appreciate.

Connecting incoming calls to the OpenAI realtime API

Connecting an incoming call to the OpenAI realtime API

Using a 3rd party TTS with the OpenAI realtime API

Using a 3rd party speech synthesis with OpenAI realtime API

Function calling

Function calling example