How good are AI Voice Agents?

By Devavrat Mahajan
|
June 21, 2024
If you are a business leader who utilizes a contact center for customer service or inbound sales, you must have pondered about improving their productivity using the latest advancements in AI.
The advantages are clear. Every new human that is onboarded on the workforce needs to be trained, deployed at sub-optimal utilizations, their bandwidth must be planned in advance, and when the demand scales, it is impossible to scale the supply immediately. However, a hypothetical AI that does the job as good as experts in the workforce can get rid of all of these problems. It is a dream employee that can serve your needs perfectly. But, why then aren’t all of the contact centers taken over by AI Voice Agents already?
The answer is that such an AI is still just that, hypothetical. It does not perform as good as an expert on the workforce, and it isn’t cheaper to use than a well oiled contact center running at an 80%+ utilization. Find it hard to believe? Here are some numbers

Cost of a human contact center agent per minute

Almost all of the contact centers are located in relatively poor countries where an agent is paid a salary of $2000-3000 per year.

A human agent costs 3.2 cents per minute

Cost of an AI Agent Per Minute

Here are some of the existing solutions in the market that offer Voice Agents as a Product and their pricing
Synthflow AI: ~15c per minute, ~33c per minute with powerful LLMs, and even higher if sophisticated speech models are used
Vapi AI: ~13c per minute with lighter LLMs, ~31c per minute with powerful LLMs
So, the cost of using a Voice Agent is a whopping 4 times at best, and 10 times at worst compared to a human. Not to mention that the humans would be better equipped to handle ambiguity, take actions on customer issues, and make the customer feel more valued than if they were talking to an AI.
So is it all doom and gloom

Challenging our Assumptions

The above calculations hold several assumptions
[1] The Contact Center is well oiled, and is running at an efficiency of 80%+
Reaching an efficiency of 80% for any services business is a dream, and does not happen unless the business has been around for several years with well oiled operational processes. More often, Contact Centers are operating at an efficiency of 60-70%.
[2] Aggregator’s margins
BPO businesses are often run at margins exceeding 50%, so the end user is often seeing twice the cost than what is paid to the agent in salaries. So unless you are planning to have an in-house contact center, you need to account for the margins as well
[3] Recruiting and Training Costs.
We have considered the salary, but not the Sourcing and Training costs. Recruiting and Training costs for an average agent is 20-25% of their annual salary
[4] This assumes your business does not need to scale
If you intend to scale rapidly (5-10X) in the next year or so, you won’t be able to provide an estimated bandwidth that you require to your service provider. Hence, you will face problems scaling, and may need to look for several service providers. This will increase your sourcing costs by a further 20-25% every time you intend to scale.
[5] Contact Center may not be in a poorer country
In case you need specialized services in specific locales and accents, then you may need to turn to countries with higher costs. You might have heard of the famous Klarna case study where Open AI partnered with Klarna to eliminate almost 700 contact center agents in Sweden.
As these assumptions fail, so does the cost of 3.2 cents per minute, and shoots up to 15 to 20 cents per minute for specialized use cases.

Can we reduce the costs of AI Agents?

Yes, there are several methods to reduce the costs of AI Voice Agents. It is recommended to use an AI Partner that specializes in Voice Use cases to do this, as ensuring Low cost while maintaining high quality conversations requires quite a bit of fine tuning.
Currently, here is how a ready made voice stack like that of Vapi AI or Synthflow AI looks like:
Voice AI stack and cost distribution
Here is how you can reduce the costs of each layer:
[1] Speech to Text (Transcription)
(a) Using a lighter model
This can bring the costs of transcription down to 0.4 cents per minute. For most use cases, this is enough.
(b) Waiting for LLM costs to come down
LLM costs are rapidly decreasing by the day, and it is estimated that the costs can become 1/3rd or 1/4th of what they are today within a year.
[2] Language Model
(a) Using lighter or open Source models
Hosting open source models take a higher up-front cost, but lower costs as you scale. Hence, if you are operating at a volume of more than 4000 minutes of calls per day, then choosing open source will give you cost savings.
(b) Using Small Language Models or Decision Trees for Predictable Conversations
In cases where a user interview or feedback is being collected, the model does not need to work out responses in real time, and can simply move to the next question once the user completes the answer. This especially works in case of short answers. This will reduce the costs to near 0 per minute.
[3] Business Logics
(a) Building a voice solution using an AI partner or developing in-house
Instead of paying 5 cents per minute continuously to Service Providers, you can develop business logics in-house at a one time cost of around $20,000 from AI partners like Tailored AI.
[4] Text to Speech (Voice Synthesis)
(a) Using lighter models
There are tiers in voice synthesis. You can use free tiers for English language that offer a robotic voice, or paid tiers at a price of 0.5 cents per minute for other languages.
(b) Using custom phrases for predictable questions
For predictable answers, you can pre-configure high quality voice notes and bring your cost down to 0 for such cases.
[5] Transport
For this, there isn’t a scope of reduction as there aren’t any lighter models available.
Building custom solutions instead of taking ready-made solutions from providers like Vapi or Synthflow will save you on an average 8-9 cents per minute, and bring your cost down to 4 cents per minute. An investment of $20000 can be recovered in around 2,00,000 minutes of calls, or by the time 3 agents work full time for a year.

Concluding Remarks

Using AI Voice Agents makes sense in certain scenarios. Building custom agents is beneficial if you have a high volume of calls
Tailored AI Branding

Transform your operations, insights, and customer experiences with AI.

Ready to take the leap?

Get In Touch