On March 31, 2026, Anthropic accidentally shipped the entire source code of Claude Code to the public npm registry. Around 512,000 lines of TypeScript across 1,906 files, including 44 hidden feature flags and references to an unreleased model codenamed Mythos, sat openly accessible on a Cloudflare storage bucket until a security researcher found it and posted the link on X. Within hours the codebase had been mirrored across GitHub, amassing thousands of stars before Anthropic could issue DMCA takedowns. Anthropic called it a packaging error caused by human error. That explanation is accurate and also somewhat beside the point.
By exposing the blueprints of Claude Code, Anthropic handed a roadmap to anyone who wanted to design malicious repositories specifically tailored to trick Claude Code into running background commands or exfiltrating data before a user ever sees a trust prompt. The permission enforcement logic, the sandboxing architecture, the exact orchestration mechanics that govern how the agent validates what it is allowed to do: all of it now sits permanently in the wild across tens of thousands of forked repositories that no DMCA notice will fully reach. What the leak exposed about the state of AI security is more uncomfortable than the leak itself.
One Side Is Moving Faster
The conventional framing around AI in cybersecurity treats it as a rough equilibrium, an arms race where offense and defense accelerate together. That framing does not hold up well against the specifics of what actually happened in March, or against what security teams describe working with day to day.
The exposed hook and permission logic from the Claude Code leak makes silent device takeover more reliable for attackers who know where to look. Defenders, meanwhile, are integrating AI into existing security stacks and validating that it will not generate false positives before it becomes operationally useful. Those two timelines are not comparable.
Tim Burke, who has run managed security operations for over 30 years at Quest Technology Management, puts the asymmetry plainly. “Attackers got the entire blueprint for how an agentic AI validates permissions and handles credentials without having to reverse-engineer any of it,” he says. “That means attackers are operating with AI that moves faster than most detection systems were designed to handle while security teams are still figuring out how to deploy AI tools without creating more work for already overwhelmed SOCs.”
Google’s Threat Intelligence Group identified the first confirmed zero-day exploit developed entirely with AI assistance earlier this month and stopped a planned mass exploitation event before it could execute, which represents the optimistic version of this story. Most organizations defending against those same capabilities are not Google, and their detection infrastructure was not built for what is now possible.
“Most organizations are still running detection infrastructure that was designed to catch human attackers who move methodically through networks over days or weeks,” Burke says. “AI compressed those timelines to hours and in some cases minutes, which means the window between intrusion and damage is now shorter than the time it takes most SOCs to investigate a single alert.”
The Alert That Does Not Exist
Underneath the speed problem is something more structural. Security platforms are built to detect behavioral anomalies, things that look like malicious activity based on what is happening rather than what is driving it. What they cannot tell you is whether an attack was initiated by a human or an AI agent operating autonomously. No platform currently surfaces that distinction.
The vulnerability discovered in Claude Code after the leak illustrated this directly: a malicious file can instruct the AI to generate a command pipeline that looks exactly like a legitimate build process, triggering behavior that bypasses the permission system entirely without raising a flag that would appear in a conventional SIEM.
“AI agents can be manipulated through tool descriptions and prompts in ways that bypass traditional access controls without ever triggering an authentication failure or raising an alert in your SIEM,” Burke says. “That means detection needs to start tracking what the agent understood it was doing and why it made that decision, rather than flagging policy violations after the fact.”
The Claude Mythos references in the leaked files add a layer to this that has not received much attention. What was exposed was not just the current tool but the architectural direction of where agentic AI is heading, including enhanced reasoning capabilities and deeper native tool-use integration. Security teams are building defenses against what these systems can do today. The leaked roadmap describes something considerably more capable.
“Right now the vast majority of platforms can’t make that distinction between AI and human origin,” Burke says, “and security teams are essentially defending blind against an entire category of threat they have no visibility into.”
The Anthropic leak was a misconfigured debug file. The organizations now trying to figure out whether their security infrastructure can detect what an AI agent believed it was authorized to do are working on a problem that existed before March 31 and will exist long after the DMCA notices are processed.
There is no clean ending to that problem yet.
Get the TNW newsletter
Get the most important tech news in your inbox each week.
TNW newsroom and editorial staff were not involved in the creation of this content.