Coval has raised $28m to test AI voice agents before they reach real callers. Its founder built the same kind of safety checks for Waymo’s self-driving cars, and thinks voice needs them just as badly.
An AI voice agent can sound flawless in a demo and fall apart on a real call. It trips over accents, talks over background noise and freezes when a caller goes off script. Coval wants to catch those failures before a customer ever hears them. Investors are betting it can.
The San Francisco startup has raised $28m in a Series A round led by Norwest. Base10 Partners, Twilio Ventures and Y Combinator also joined. The deal brings Coval’s total funding to $31m since it launched in 2024. The company is a Y Combinator graduate.
The pitch is simple. As more firms put voice agents in front of customers, they need a way to prove the agents actually work. Coval sells that proof.
From self-driving cars to phone calls
The idea comes straight from autonomous vehicles. Coval founder and chief executive Brooke Hopkins built evaluation infrastructure at Waymo, Alphabet’s self-driving unit. There, her team ran millions of simulated miles for every code change, because a failure on a public road was never an option.
Hopkins argues voice agents need the same discipline. A voice agent runs several models at once. One transcribes the speech, another works out a reply, a third speaks it back. That mirrors the perception, planning and control systems in a self-driving car.
The conclusion follows from the comparison. You cannot test either system by hand at scale. Simulation is the practical way to do it. Coval applies the simulation-first method Hopkins learned at Waymo to the messy world of phone calls.
What Coval actually does
Coval runs tens of millions of simulated tests on a voice agent. It probes for the things that break real calls: accents, interruptions, background noise and odd, unscripted requests. The checks happen before any customer is exposed.
The work does not stop at launch. Coval keeps watching agents in production and feeds failed calls back into testing automatically. A bank, for instance, can simulate thousands of callers who give conflicting details or hang up early, all before a single real customer dials in.
The company says the payoff is large. Customers cut manual quality-assurance work by up to 30 times. They deploy agents up to 10 times faster. More than 60 organisations now use the platform, including Zoom and the voice-AI infrastructure firm Deepgram.
Those two names carry weight. Both Zoom and Deepgram have deep experience with how voice AI fails. Their endorsement is a signal that the problem Coval targets is real.
Why voice AI needs a referee
The timing is not an accident. Money is pouring into voice AI. Coval points to figures showing more than $7bn went into the sector in the first quarter of 2026 alone. One forecast has the market passing $20bn by 2031.
That boom has its own gravity. Startups like Bland have raised tens of millions to build the agents themselves, and Twilio’s voice-AI revenue has been climbing fast. As more agents go live, more of them will fail in public. Testing becomes the unglamorous job underneath the hype.
Coval is not the only one chasing it. Rivals include Hamming, which focuses on regulatory edge cases in healthcare and finance, and Roark, a fellow Y Combinator startup that has replayed more than 10 million minutes of calls with updated logic. Coval argues it offers the full stack instead, from pre-launch simulation to live monitoring and human review.
The category itself echoes a familiar pattern. Other startups, such as Solidroad, are building quality-assurance tools for AI support agents across chat and email. Coval is making the same bet, but for the harder problem of live audio.
The Twilio question
One investor on the cap table is worth a second look. Twilio Ventures backed the round, and Twilio sells the voice infrastructure many of these agents run on. It could have built its own testing tool. It chose to invest in Coval instead.
“Trust is critical to scaling these experiences,” said Andy O’Dower, a field chief technology officer at Twilio. He called comprehensive evaluation tools “foundational” to the current wave of voice AI. The vote of confidence comes from a company that sees the whole market move through its pipes.
That choice hints at a bigger industry question. Will voice-AI testing stay independent, or get swallowed by the platforms it checks? Twilio backing an outside tool, rather than building one, suggests at least one major player wants evaluation kept separate.
There is a logic to that separation. A referee that works for one team is not much of a referee. An independent evaluator can test agents built on any model or any platform, which is exactly what enterprises juggling several vendors say they want.
What happens next
The new money is aimed at growth. Coval will hire across its sales and solutions-engineering teams. It will also deepen the product, with richer simulation, more integrations and stronger human-review and monitoring tools.
The momentum looks real. Coval says revenue has grown tenfold over the past year, though it has not shared its actual revenue or headcount targets. Voice agents are spreading across customer service, sales, financial services and healthcare, and each one is a potential customer.
The deeper bet is about where voice AI is heading. Hopkins thinks every company will run a voice agent the way it now runs a website or an app. “Most enterprises don’t have the infrastructure to deploy these systems with confidence,” she said.
Norwest is convinced she can supply it. “She helped prove self-driving cars could work,” said partner Scott Beechuk, “and now she’s tackling voice AI.” The comparison is flattering, but it cuts both ways. Self-driving took far longer and cost far more than anyone promised. Whether voice agents earn the trust to handle a real call, at the scale Coval imagines, is the question this round leaves open.
Get the TNW newsletter
Get the most important tech news in your inbox each week.