FAQ and tips
In this article, you will find answers to the most common questions on Avatar bot creation. The questions cover the following topics:
In addition, we will provide Avatar design tips.
Dialog scenario coding
Can I extend the avatar logging with my own messages?
Yes, you can use the Logger.write('Hello world')
function at any dialog state.
How can I do product analysis in my avatar?
You can attach any analytics system if it supports HTTP API for streaming events. For example, integration with Mixpanel looks like this:
Conversation flow management
If my avatar did not get what the customer said at any dialog state, how can I implement re-asking N times before a fallback (to the agent, for example)?
The onUtterance: async (event)
handler has a specific argument: event.utteranceCounter
. This counter is incremented on every new utterance that comes to this state. It helps implement re-asking logic. Let us consider a state in which the user answers yes or no and any other answer is unexpected. The re-asking logic can be implemented this way:
In that case, if the avatar gets an unexpected answer, it tries to re-ask several times (with different phrases), and on the 4th try, it switches to the agent.
If the user has not caught what the avatar said, how do I make the avatar repeat the last phrase?
To do that, you can:
Create a wrapper for the
Response()
function, which will save the last passed utterance to a global variableCreate the ‘repeat' intent with training phrases like “what was that“, “sorry can’t hear you“, etc…
Add a branch to the utterance handlers in each state for repeating the last avatar phrase without changing the state
The final scenario looks like this:
Extracting structural data/slot filling
Receiving particular data types from the customer speech
Let us assume, at a particular stage of the dialog, the avatar expects the end-user to mention some data of a particular entity type. Voximplant allows developers to force the NLU engine to prefer a certain entity type in case of ambiguity.
Let us consider the following dialogs:
Avatar: for how many people do you want to reserve a table? End-user: for two (Number)
Robot: for which time do you want to schedule an appointment? End-user: for two (Datetime)
In these examples, NLU should return different types of extracted entities. NLU should be able to guess it, using the conversation context, but unfortunately, the current system design does not allow doing it automatically. In this case, you can use hints to help the avatar know the type.
When you write your avatar scenario, you can manually specify, which data type you expect to get from a user in the nluHints field. For example:
So you can pass any type of supported custom entity as expectedEntity to make sure, that it does not confuse in case of ambiguity. Working with dates, you can specify the granularity of systemTime, for example:
Find out more information on supported hints in the API reference.
How do I extract domain-specific parameters from the user requests (store items, promo codes, products, etc)?
You can use the custom entities functionality for that.
- Create a custom entity
Add a custom entity in the Entities section of the Voximplant control panel. Since this name will be used in the dialog scenario code, choose a JS-friendly name like “storeItem“ or “promoCode“ (in camel-case, no spaces, numbers, underscores, or special characters).
- Add entity entries
Add specific entries to be extracted from the user’s utterance.
- The entry value will be returned to the dialog scenario as a normalized entity value
- Synonyms – entry names that can be stated by users to refer to these entries
- Handle entries in the dialog scenario code
If any of the entries have been detected, they will be returned through the entity property on the onUtterance handler:
What is the easiest way to implement slot filling for forms with several parameters? For example, if the customer wants to schedule an appointment and the avatar needs to get the date and name.
There are two ways to do that:
Implement a state for each form slot
Have a single state for the slot-filling process
In our experience, it is more convenient to use the second option. You have to:
Define the form (slots, slot types, phrases to ask the user to fill this slot) and helper functions in the global code scope of your avatar
Define a special state in which the user stays until the form is filled
Training data and machine learning
What is a good practice for creating intents?
Have enough examples for every intent. The general rule of thumb is that 10-20 examples are pretty much enough for every intent. For complex cases, this number can grow up to hundreds.
Create various examples. Various training phrases (using synonyms, different verbs, adding interjections) help the neural network be more stable.
Keep your training data close to what you expect to get from the customer in your specific channel. For example, in a voice channel, users speak much more than they would write in a text channel, so it is useful to have real phrases in your training data. In the text mode, the user will probably write “book a table“ but in the voice channel, it can be “Hey, I was wondering if it’s possible to book a table at your place?“.
Avoid semantically close intents. For example, if you have different intents for the phrases “I want to get a debit card“ and “I want to have a credit card“ – it can lead to suboptimal performance. It is better to have a single intent “open a bank product“ and a custom entity “bank product“ which has values: “debit card“, “credit card“, and “bank account“. The same goes for nested intents: if training phrases from one intent fit some other intent, then the neural network will show poor performance on these phrases or will require much more training data.
Add as many phrases that you do not want to handle as possible. Having enough negative examples is crucial for a good neural network’s performance. So if you expect your users to ask about things you do not want to handle (if they do not belong to any of the defined intents like “Who is the president of the United States?“ or “What’s your name, robot?“, add them to the “unknown” intent. It is good to have 3-4 conversations with your bot in the debug mode, and then go to the Avatars’ Training section to find these examples and add them to training.
Name intents and custom entities in the JavaScript naming convention. You will refer to entities and intents from the JavaScript code (like
event.entities.customEntityName[0]
), so it makes sense to follow the JavaScript naming convention.
General design tips
Keep in mind your business objective. In most cases, automating 100% of communications is a bad idea. You can spend too much effort on edge cases without much outcome. So focusing on the most common cases and designing fallbacks for the rest is more effective.
Go step by step and have small iterations. It is better to start with a simple dialog structure and launch a pilot on a small fraction of the user base. Feedback from real users can be surprising and will be a great guide for further development.
Give some guides to a user. At any conversation turn, it should be obvious to your users which input is expected from them. Steer the conversation by replying with questions instead of statements to avoid dead ends.
- Bad: “Hello“
- Good: “Hello, it’s the X support desk. How can I help?“
- Great: “Hi, it’s the X support desk. I can help you with IT issues. What’s your problem?“
Pay attention to error handling. Speech recognition errors, call quality issues, neural network errors, unexpected user inputs: all of these can push your customer out of a happy path. Make sure that in every step you handle errors gracefully, so customers either return to a happy path or get some help from human agents. There are several good practices for conversation repair (when we have unexpected intents):
- Ask the user to rephrase N times and then switch to the agent.
- While asking to rephrase the last utterance, give some hints on what exactly you expect as an answer
Consider using pre-recorded phrases made by a real person for avatar utterances. Even though speech synthesis sounds moderately natural, it is still far from a real human voice. So if responses are not parameterized, you can pre-record and use them as the avatar voice.
Gain user trust by asking acknowledging questions. When your avatar is doing something on behalf of the user, make sure you have double-checked all action parameters with the customer. For example, by asking: “So you want to book a table for two this Saturday? Right?“