This a16z-backed startup says the fix for AI errors is a weaker model, not a smarter one

Probably has raised $9M to wrap small AI models in a deterministic 'harness' that catches hallucinations before they reach you. The payoff: near-perfect accuracy on a model cheap enough to run on a laptop.


This a16z-backed startup says the fix for AI errors is a weaker model, not a smarter one Image by: Probably

Most of the AI industry is trying to fix hallucinations by building bigger, smarter models. A startup called Probably is betting on the opposite.

The company has raised $9m in a seed round co-led by Andreessen Horowitz and Accel, with Tokyo Black and Vermilion Cliffs Ventures, to catch AI’s factual errors before they ever reach a user. It is aiming for the 99.99% accuracy that ordinary software takes for granted but large language models rarely hit.

Its trick is to lean on the model less, not more. Probably’s first product, a local ‘verifiable data agent’ that answers questions from messy datasets, runs each answer through what founder Peter Elias calls a ‘data science mech suit’.

A harness, not a bigger brain

The model takes a first pass, then a separate, deterministic validator checks the answer against the actual data and bounces anything that does not match. The model is trained against that validator, and every result ships with a citation and an audit trail.

‘The better your harness engineering is, the weaker the model can be,’ Elias says. Reduce the ambiguity enough, the argument goes, and the AI barely has to think.

That has a striking consequence for cost. Probably’s tool runs on a model Elias describes as ‘four classes weaker’ than the frontier, small enough to run on a desktop rather than a data centre, which strips out most of the token bill.

It also doubles as a privacy pitch. The whole thing runs locally on the open-source database DuckDB, and the company says the model only ever sees metadata and statistics, never the raw data, which stays on your machine.

Aimed at the token-cost backlash

The timing is pointed. Companies are watching AI bills balloon even as per-token prices collapse, and a tool that delivers accuracy on cheap, local hardware speaks directly to that anxiety.

It also lands where errors hurt most. Probably says the same engine could extend to accounting or medical work, any ‘precision-sensitive’ job, the kind where a confident wrong answer is the whole problem, as researchers warning about hallucinations in science keep pointing out.

A provocative claim, and the catch

Elias goes further, arguing the big labs have not built this because ‘they make money the more times you have to correct the model’. It is a tidy sales line, and a contestable one: the major labs pour resources into cutting hallucinations, and a smaller player has every reason to cast itself as the honest broker.

The bigger caveat is scope. A validator only works when there is a hard ground truth to check against, such as a dataset, which is why Probably started with data rather than open-ended writing. It is a $9m seed, the product is in public preview at version 0.1, and the 99.99% figure is still a goal, not a result. But in a market crowded with attempts to tame hallucinations, betting on smaller models is at least a refreshingly different wager, and one a16z and Accel were willing to fund.

Get the TNW newsletter

Get the most important tech news in your inbox each week.