Your robot can’t be smart, fast, and free. Evolution solved that already.


Your robot can’t be smart, fast, and free. Evolution solved that already. Image by: Canva

Here is a constraint that almost no one building physical AI says out loud, even though every one of them is quietly fighting it.

A robot’s intelligence wants three things at once. It wants to be smart, meaning it can reason at the level of a frontier model about an unfamiliar scene. It wants to be fast, meaning it responds inside the tight, deterministic timing a physical control loop demands. And it wants to be free, meaning it keeps working when the network drops, the warehouse Wi-Fi dies, or the machine goes somewhere no signal reaches.

You cannot have all three on one piece of compute. Pick any two.

To be precise, bounded autonomy already works. Industrial arms, drones, and constrained autonomy stacks can be fast and offline because their tasks are narrow. The trilemma bites at the frontier: you cannot put frontier-scale general reasoning, deterministic real-time response, and full offline autonomy into the same power-limited substrate, not for the same control loop.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

A frontier-scale model is smart, and if you stream its sensors to a datacenter it can even be fast, but now it is tethered to a network and no longer free. Shrink that model until it fits on a 15-watt embedded module and it becomes fast and free, but it is no longer frontier-smart. Run the big model in the cloud and query it only occasionally, and you get smart and free, but never fast. Three corners, two available at a time. I have come to think of this as the embodied trilemma, and it is the real reason the edge/cloud question is the hardest architecture decision in robotics. Most teams treat it as a deployment detail. It is closer to a law.

Why you can’t cheat the triangle

The trilemma is not a fashion or a temporary hardware limitation you can wait out. It falls directly out of physics and power budgets.

Frontier reasoning quality currently lives in models that want tens of gigabytes of memory and datacenter-class accelerators. That hardware does not run on a battery a mobile robot can carry. So “smart” forces a choice: either bring the datacenter to the robot through a network link, which sacrifices freedom, or accept a smaller onboard model, which sacrifices smartness.

Real-time control is even less negotiable. A wide-area network round trip adds 30 to 100 milliseconds of latency, and the variance matters more than the average. A control loop that is usually fast but occasionally stalls is worse than one that is reliably mediocre, because controllers are tuned for deterministic timing. The moment “fast” depends on a network, you have surrendered “free,” because the network is now inside your control loop whether you meant it to be or not.

So the triangle holds. Quantization, distillation, and better accelerators move the corners, but they do not collapse them. Anyone claiming otherwise is usually hiding which corner they gave up.

Putting numbers on the triangle

It helps to make the constraint quantitative, because the moment you write the timing down, the corners stop being abstract.

Start with latency. The end-to-end delay of a perception-to-action decision made in the cloud is a sum of terms:

Lcloud = tcapture + tencode + tuplink + tinference + tdownlink + tdecode

Run the same decision onboard and most of that sum disappears:

Ledge = tcapture + tinference,local

The difference between the two is not the inference time, which can actually be lower in the cloud on better hardware. The difference is the network, tuplink + tdownlink, and more importantly its variance. A measured cloud-robotics setup over a fast wired link saw round trips of roughly 30 milliseconds [7], while real-world deployments commonly sit in the 100 to 300 millisecond range, and wireless links swing far higher. Edge processing, by contrast, pulls round trips down toward 1 to 5 milliseconds because nothing leaves the machine [8].

Now state the rule that decides where a loop can live. A control loop with timing budget Lbudget can run on a given compute path only if

Lpath + k·σjitter ≤ Lbudget

where σjitter is the standard deviation of the path’s latency and k is the safety factor you need for determinism. That k·σjitter term is the quiet killer. Teleoperation studies are blunt about it: a link that holds a steady 100 milliseconds is workable, but one oscillating between 30 and 200 milliseconds produces jerky, unpredictable motion, because the controller cannot plan around delay it cannot predict [9]. The reflex loop’s budget is 1 to 10 milliseconds. No wide-area path satisfies the inequality. The math, not the architect, forbids it.

Control loop Timing budget Onboard path (~1-5 ms) Wide-area path (~30-300 ms)
Reflex (motor control, e-stop) 1-10 ms Feasible Impossible
Perception (detection, tracking, SLAM) 30-100 ms Feasible Marginal, fails on jitter
Deliberation (planning, language) 1-10 s Feasible Feasible (async)

The table is the argument in one view. Reflex never clears a network round trip. Perception clears it only on unusually good links. Deliberation has budget to spare, which is why it can live in the cloud asynchronously.

Bandwidth closes the case for perception. A single 1080p camera at 30 frames per second produces raw video at 1920 × 1080 × 3 bytes × 30, which is about 1.5 gigabits per second. A modest four-camera plus depth rig clears 6 gigabits per second of raw sensor data. You can compress it, but compression costs latency and the link still has to carry it reliably, everywhere the robot goes. Edge perception is the robotic version of that move. Compress to a semantic representation on the spot; never ship the raw stream.

Finally, the economics, which is just the trilemma with a dollar sign. Onboard compute is a one-time capital cost. Cloud reasoning is an operating cost that accrues with every query:

Ccloud(t) = r·ctoken·t

where r is the query rate and ctoken the per-token price, against a flat Cedge = Ccapex. The two lines cross at t* = Ccapex / (r·ctoken). Push thirty frames a second to a cloud model and t* arrives almost immediately, so cloud cost dominates the lifetime of the fleet. Route only a few deliberation-class queries per minute upstream and t* recedes over the horizon.

Strategy What goes upstream Cost shape Break-even t*
Stream everything ~30 frames/sec to a cloud model Steep linear opex Almost immediate
Route deliberation only A few queries/min Shallow linear opex Past fleet service life
Fully onboard Nothing One-time capex, flat Never crossed

Same hardware, same models, opposite economics, decided entirely by which loop you placed in which corner. The gap is not subtle: a single camera streamed to a cloud vision model at 30 frames per second is on the order of a million inference calls a day per robot, while routing only deliberation-class queries upstream might be a few hundred. Across a fleet, that is the difference between cloud inference being a rounding error and being the largest line on the operating budget.

The escape nobody designed, because biology did it first

Here is the part I find beautiful, and the heart of what I want to argue: the way out of the embodied trilemma is not to solve it. It is to refuse to answer it at a single point.

Your own body is built this way, and it has been for roughly half a billion years.

When you touch a hot stove, your hand pulls back before your brain knows anything happened. That is the spinal reflex arc, a loop that runs through the spinal cord and never waits for the cortex. It is fast and free (it works even if you are barely conscious), and it is emphatically not smart. It does not reason about the stove. It does not need to.

Your retina does something just as telling. It has over a hundred million photoreceptors, but the optic nerve carrying signal to the brain has only about a million fibers [10]. The eye does roughly a hundredfold compression on the spot, locally, before transmitting anything. It does not ship raw pixels up the cable. It ships a processed, compact representation. Fast and free at the edge, by necessity.

And then there is the cortex, which is where the actual reasoning happens. It is slow, it is powerful, and crucially, the body has arranged things so that when the cortex is slow or offline, the reflexes still fire and you still pull your hand back. Evolution put the survival-critical functions where they never depend on the smart, slow part.

That is the whole trick. Biology never built a single neuron that was smart, fast, and free all at once. It built a hierarchy in which different loops each sit at a different corner of the triangle, and it made sure the corner each loop sacrifices is one that loop can afford to lose. Reflexes give up intelligence, which is fine, because a reflex that stops to think is a reflex that gets you killed. The cortex gives up speed, which is fine, because it has been kept off the survival path entirely.

A robot escapes the embodied trilemma the same way, or it does not escape at all.

Mapping the triangle onto a machine

Translate the nervous system into engineering and a practical architecture emerges. A robot has three loops, and each one belongs at a different corner.

The reflex loop (1 to 10 ms): motor control, stabilization, emergency stops. This is the spinal cord. It must be fast and free and is allowed to be dumb. It lives onboard, always, and never touches a network.

The perception loop (30 to 100 ms): detection, tracking, obstacle avoidance, visual odometry, SLAM. This is the retina. It must keep working when the link drops, and the bandwidth math forbids shipping raw sensor data anyway, since even a single camera produces well over a gigabit per second of raw video before compression. So perception compresses at the edge, exactly as the eye does, and emits a compact semantic representation rather than pixels. Fast and free, intelligence traded away on purpose.

The deliberation loop (1 to 10 seconds): task planning, language understanding, deciding what to do when the plan breaks. This is the cortex. It is allowed to be slow, and slowness is exactly the corner it trades away, reaching a frontier model in the cloud asynchronously rather than in the control path. It stays free in the only sense that matters, never holding the robot hostage to a live link. If connectivity vanishes, the robot gets less clever, not less safe.

The interface between these layers is the optic nerve of the system: a deliberately narrow channel carrying detections, tracks, and state summaries, never raw signal. Get that channel right and you have not just an inference boundary. You have defined your logging schema, your training-data pipeline, and your behavior when the link drops, all at once.

The industry is rediscovering the nervous system

What convinces me this is structural, not stylistic, is that the most advanced robotics programs keep reinventing the same hierarchy without necessarily naming it.

Figure AI’s Helix, the system running its humanoid robots through full eight-hour factory shifts, is explicitly two systems: a roughly 7-billion-parameter vision-language model at 7 to 9 Hz for scene understanding and language, coupled to a compact 80-million-parameter visuomotor policy that turns intent into continuous action at 200 Hz [1]. That is cortex and reflex on one robot, a 25-to-1 ratio in update rate between the loop that thinks and the loop that acts, each running at the timescale its job demands. Surveys of edge-cloud collaboration now describe the same division as standard practice, with small onboard models handling real-time perception and privacy-sensitive preprocessing while heavier reasoning is offloaded upstream [4].

Comparisons on real robot data quantify the trade directly: deploying an 11-billion-parameter vision-language model at the network edge held accuracy close to its cloud baseline while shaving only modest latency, whereas a compact 2-billion-parameter model more than halved latency into sub-second territory, paying for the speed with accuracy [5]. Reviews of foundation-model robotics keep flagging the same wall: LLM planners take seconds per decision, fine for the cortex, hopeless for the spinal cord [6]. NVIDIA’s own Jetson deployment guidance reflects it too, with optimized onboard inference for perception and policy and larger models living upstream [2].

Different teams, different machines, the same triangle, the same corners. When that many independent efforts converge, you are looking at structure, not style.

Lessons from the ultimate airgap

The starkest place to watch the trilemma bite is underwater robotics. An ROV below the surface has effectively no real-time link to the cloud. The ocean is the ultimate airgap, the freedom corner taken to its absolute extreme. In hands-on underwater robotics builds, perception (detection and tracking, optimized with TensorRT) runs entirely on an onboard module, while language-level mission interaction and fleet reasoning reach a frontier model in the cloud only asynchronously, on surfaced or relayed data, and never inside a control loop. The architecture is not a preference there. The water enforces it.

Three principles follow, and they generalize far beyond the sea.

Design for the disconnected case first. If the robot is safe and useful with zero connectivity, the cloud becomes pure upside: better reasoning, fleet learning, human oversight. If the robot needs the cloud to stay safe, you have built a cortex with no spinal cord, a liability on wheels.

Treat the narrow channel as a contract, not a cable. The compressed representation crossing the edge/cloud boundary is the single most important interface in the system. Teams that treat it as an afterthought re-architect twice.

Remember the trilemma is also an economics statement. Onboard compute is paid once, at purchase. Cloud reasoning is paid forever, per token. Routing only deliberation-class queries upstream, a few per minute instead of thirty frames per second, changes fleet unit economics by orders of magnitude. Cloud-inference cost can quietly become the largest operating line on a robotics program that put the wrong loop in the wrong corner.

The corners will move. The triangle won’t.

Onboard modules get more capable every generation, and distillation keeps narrowing the gap between edge models and their cloud teachers. Early-exit inference, where confident predictions resolve locally and only hard cases escalate, is maturing fast [3][5]. The deliberation loop will migrate partly onboard over the next few years, especially for safety-relevant replanning. The corners of the triangle will keep sliding.

But the triangle itself does not go away, because it is anchored in physics and energy, not in any model generation. Smart, fast, and free will never coexist on a single substrate as long as frontier intelligence costs more power than a robot can carry and the speed of light caps how fast a remote answer can return. The teams that internalize this, and that consciously assign each loop the corner it can afford to lose, will ship robots that work when the network does not. The rest will keep learning, in the field and at the worst possible moment, that they accidentally wired their spinal cord through a datacenter.

Evolution settled this argument before there were spines. We are just catching up.

References

1. Figure AI. “Helix: A Vision-Language-Action Model for Generalist Humanoid Control.” figure.ai/news/helix. 2025.

2. NVIDIA Developer Blog. “Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics.” developer.nvidia.com. 2025.

3. Qu, G., Chen, Q., Wei, W., Lin, Z., Chen, X., and Huang, K. “Mobile Edge Intelligence for Large Language Models: A Contemporary Survey.” IEEE Communications Surveys and Tutorials, 2025 (arXiv:2407.18921).

4. Li, S., Wang, H., Xu, W., Zhang, R., Guo, S., Yuan, J., Zhong, X., Zhang, T., and Li, R. “Collaborative Inference and Learning between Edge SLMs and Cloud LLMs: A Survey of Algorithms, Execution, and Open Challenges.” arXiv:2507.16731, 2025.

5. Ahmad, S., Hafeez, M., and Zaidi, S.A.R. “Vision-Language Models on the Edge for Real-Time Robotic Perception.” University of Leeds, arXiv:2601.14921, 2026.

6. Khan, M.T., and Waheed, A. “Foundation Model Driven Robotics: A Comprehensive Review.” arXiv:2507.10087, 2025.

7. Kapoor, A., et al. “A Predictive Application Offloading Algorithm Using Small Datasets for Cloud Robotics.” arXiv:2108.12616, 2021.

8. Coutinho, R.W.L., and Boukerche, A. “Design of Edge Computing for 5G-Enabled Tactile Internet-Based Industrial Applications.” IEEE Communications Magazine, 60(1), 2022.

9. Urbaniak, D., et al. “5G for Robotics: Ultra-Low Latency Control of Distributed Robotic Systems.” IEEE.

10. Kandel, E.R., Schwartz, J.H., and Jessell, T.M. “Principles of Neural Science.” McGraw-Hill.

Get the TNW newsletter

Get the most important tech news in your inbox each week.