Probably raises $9M to fix AI hallucinations on cheap models

This a16z-backed startup says the fix for AI errors is a weaker model, not a smarter one

Most of the AI industry is trying to fix hallucinations by building bigger, smarter models. A startup called Probably is betting on the opposite.

The company has raised $9m in a seed round co-led by Andreessen Horowitz and Accel, with Tokyo Black and Vermilion Cliffs Ventures, to catch AI’s factual errors before they ever reach a user. It is aiming for the 99.99% accuracy that ordinary software takes for granted but large language models rarely hit.

Its trick is to lean on the model less, not more. Probably’s first product, a local ‘verifiable data agent’ that answers questions from messy datasets, runs each answer through what founder Peter Elias calls a ‘data science mech suit’.

A harness, not a bigger brain

The model takes a first pass, then a separate, deterministic validator checks the answer against the actual data and bounces anything that does not match. The model is trained against that validator, and every result ships with a citation and an audit trail.

‘The better your harness engineering is, the weaker the model can be,’ Elias says. Reduce the ambiguity enough, the argument goes, and the AI barely has to think.

TNW City Coworking space - Where your best work happens

A workspace designed for growth, collaboration, and endless networking opportunities in the heart of tech.

Book a tour now

That has a striking consequence for cost. Probably’s tool runs on a model Elias describes as ‘four classes weaker’ than the frontier, small enough to run on a desktop rather than a data centre, which strips out most of the token bill.

It also doubles as a privacy pitch. The whole thing runs locally on the open-source database DuckDB, and the company says the model only ever sees metadata and statistics, never the raw data, which stays on your machine.

Aimed at the token-cost backlash

The timing is pointed. Companies are watching AI bills balloon even as per-token prices collapse, and a tool that delivers accuracy on cheap, local hardware speaks directly to that anxiety.

It also lands where errors hurt most. Probably says the same engine could extend to accounting or medical work, any ‘precision-sensitive’ job, the kind where a confident wrong answer is the whole problem, as researchers warning about hallucinations in science keep pointing out.

A provocative claim, and the catch

Elias goes further, arguing the big labs have not built this because ‘they make money the more times you have to correct the model’. It is a tidy sales line, and a contestable one: the major labs pour resources into cutting hallucinations, and a smaller player has every reason to cast itself as the honest broker.

The bigger caveat is scope. A validator only works when there is a hard ground truth to check against, such as a dataset, which is why Probably started with data rather than open-ended writing. It is a $9m seed, the product is in public preview at version 0.1, and the 99.99% figure is still a goal, not a result. But in a market crowded with attempts to tame hallucinations, betting on smaller models is at least a refreshingly different wager, and one a16z and Accel were willing to fund.

Story by Cristian Dina

Cristian Dina is the CRO at The Next Web. He has interviewed 300+ industry leaders and authored the book King of Networking, establishing hi (show all) Cristian Dina is the CRO at The Next Web. He has interviewed 300+ industry leaders and authored the book King of Networking, establishing himself as one of the most connected and respected voices in the ecosystem. At just 23 years old, Cristian was included in the Forbes 30 Under 30 2025 list, representing a new generation of tech builders, bold thinkers who move fast, build with purpose, and create real impact.