OpenAI launches GPT-5.5, its first fully retrained base model since GPT-4.5


OpenAI launches GPT-5.5, its first fully retrained base model since GPT-4.5

The model, codenamed “Spud,” is designed to complete complex multi-step tasks with minimal human direction. It sets new benchmarks in agentic coding, computer use, and knowledge work, while matching GPT-5.4’s per-token latency. API access is delayed pending additional safety work.


For months, the AI industry’s open secret has been that Anthropic’s Claude is winning the enterprise market. OpenAI has been in what internal sources described as a “Code Red” state since at least December 2025, watching Anthropic’s ARR sprint from $9 billion to $30 billion while its own B2B positioning eroded.

On Thursday, OpenAI responded. GPT-5.5, the company’s first fully retrained base model since GPT-4.5, is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. The model is designed to complete work with limited human direction, operating across email, spreadsheets, calendars, and other applications.

The core thesis of GPT-5.5 is legibility. Where previous models required carefully structured prompts and multi-step supervision, OpenAI says 5.5 can take a “messy, multi-part task” and independently plan, use tools, check its work, navigate ambiguity, and keep going until the task is finished.

The gains are concentrated in four areas: agentic coding, computer use, knowledge work, and early scientific research.OpenAI describes these as domains “where progress depends on reasoning across context and taking action over time.”

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The benchmark numbers are strong. GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination.

On SWE-Bench Pro, which evaluates real-world GitHub issue resolution across four programming languages, it scores 58.6%, solving more tasks in a single pass than previous models.

On GDPval, which tests agents across 44 occupations of knowledge work, it scores 84.9%. On OSWorld-Verified, which measures whether a model can operate real computer environments autonomously, it reaches 78.7%.

On Tau2-bench Telecom, it reaches 98.0% without prompt tuning. Across all of these, OpenAI says GPT-5.5 improves on GPT-5.4’s scores while using fewer tokens.

The efficiency claim is commercially significant. Larger, more capable models are typically slower to serve, which creates a cost-quality trade-off for enterprise customers. OpenAI says GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving, meaning it delivers a step up in intelligence without a corresponding increase in response time.

It also uses significantly fewer tokens to complete equivalent tasks in Codex, which directly reduces the cost per task for enterprise deployments. GPT-5.5 is priced higher per token than GPT-5.4, but OpenAI says the net effect is better results for lower total cost in most workflows.

The safety framing is notably more cautious than previous launches. OpenAI says it evaluated GPT-5.5 across its “full suite of safety and preparedness frameworks,” worked with internal and external red-teamers, added targeted testing for advanced cybersecurity and biology capabilities, and collected feedback from nearly 200 trusted early-access partners before release.

Cybersecurity is the domain where the caution is most visible: OpenAI describes deploying “stricter classifiers for potential cyber risk which some users may find annoying initially.”

The company acknowledges that GPT-5.5 represents a meaningful jump in cyber capability and frames the enhanced safeguards as a necessary investment in responsible deployment.

The API is conspicuously absent from the launch. GPT-5.5 is available now in ChatGPT and Codex for paid subscribers, but API deployments, OpenAI says, “require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale.”

The company promises API access “very soon” but has not given a date. For enterprise customers who build on the API rather than the ChatGPT interface, this is a meaningful delay. GPT-5.5 Pro, a variant with extended reasoning, is available only to Pro, Business, and Enterprise subscribers.

The competitive backdrop is explicit in every design decision. GPT-5.5 is the model OpenAI is building its unified desktop “super-app” around, merging ChatGPT, Codex, and the Atlas browser agent into a single session.

The model is designed to power intent-aware reasoning inside that unified workspace, a product category that did not exist six months ago. GPT-5.2 Thinking will remain available for three months as a legacy option before being retired on 5 June 2026.

The velocity of the model release cadence, GPT-5, 5.1, 5.2, 5.3-Codex, 5.4, and now 5.5 in under a year, reflects both the pace of AI development and the intensity of the competition from Anthropic, Google, and the open-source ecosystem.

OpenAI is not coy about who it is competing with. Bloomberg’s framing, a model intended to “keep pace with rivals like Anthropic”, is the right one.

GPT-5.5 is the clearest signal yet that OpenAI has internalised the threat from Claude’s enterprise market share and is attempting to win back the B2B segment with a model that can genuinely work, not just answer questions.

Whether it succeeds depends on whether the performance gains hold in production workflows, whether the API arrives before enterprise customers make their next procurement decisions, and whether “Spud” can do what its benchmarks promise when the prompts are messy and the tasks are real.

Get the TNW newsletter

Get the most important tech news in your inbox each week.