Meta’s Muse Spark is here – and it’s closed source


Meta’s Muse Spark is here – and it’s closed source

In short: Meta has released Muse Spark, the first model from Meta Superintelligence Labs, the unit it assembled under Alexandr Wang after spending $14.3 billion to acquire a stake in Scale AI. Rebuilt from scratch over nine months, the model is natively multimodal, introduces a “Contemplating” reasoning mode that runs sub-agents in parallel, and is now powering Meta AI across the company’s platforms. In a break from Meta’s Llama heritage, it is closed source.

The model’s arrival closes a chapter that began in June 2025, when Mark Zuckerberg announced Meta Superintelligence Labs and installed Wang as the company’s first-ever chief AI officer. The mandate was explicit: catch up with OpenAI, Anthropic, and Google, and do so with a team and infrastructure rebuilt specifically for the task. Nine months later, that team has something to show for it.

Nine months to rebuild the stack

Nine months ago we rebuilt our AI stack from scratch,” Wang wrote on X on Wednesday. “New infrastructure, new architecture, new data pipelines. Muse Spark is the result of that work, and now it powers Meta AI.” The statement is a direct acknowledgement of how deep the rebuild went: not a fine-tuned iteration on an existing architecture, but a replacement of the foundational infrastructure from which Meta’s models are trained.

The model, known internally as Avocado, had been delayed earlier this year after falling short of rivals in internal tests for reasoning, coding, and writing. The release on Wednesday suggests those gaps have been addressed to a degree that Meta considers competitive, even if the benchmark picture remains mixed. Wang’s framing emphasises process over product: Muse Spark is described as the first in a family of models, not a definitive answer to the frontier leaders.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

Muse Spark is natively multimodal, accepting voice, text, and image inputs, with text-only output at launch. It operates in a fast mode for casual queries, and a new “Contemplating” mode that orchestrates multiple sub-agents to reason in parallel, a direct bid to compete with the extended reasoning modes offered by Google’s Gemini Deep Think and OpenAI’s GPT-5.4 Pro. A key efficiency claim accompanies the release: Meta says Muse Spark achieves its reasoning capability using more than ten times less compute than Llama 4 Maverick, driven by a training technique called “thought compression” in which the model is penalised during reinforcement learning for excessive thinking time, forcing it to solve problems with fewer reasoning tokens without sacrificing accuracy.

Where the benchmarks tell a complicated story

Meta’s published benchmarks place Muse Spark fourth on the Artificial Analysis Intelligence Index v4.0, with a score of 52, behind Gemini 3.1 Pro Preview and GPT-5.4 (both at 57) and Claude Opus 4.6 (53). The overall ranking reflects a genuinely mixed performance profile rather than a uniform shortfall.

On GPQA Diamond, the graduate-level scientific reasoning benchmark, Muse Spark scored 89.5%, trailing Gemini 3.1 Pro’s 94.3%, OpenAI’s GPT-5.4 at 92.8%, and Claude Opus 4.6 at 92.7%. On ARC AGI 2, the abstract reasoning benchmark, the gap is more significant: Muse Spark scored 42.5 in Contemplating mode against Gemini 3.1 Pro’s 76.5 and GPT-5.4’s 76.1, a difference that suggests the model’s parallel sub-agent architecture does not fully close the distance on abstract reasoning tasks. On software engineering, Muse Spark scored 77.4% on SWE-bench Verified.

The areas where Muse Spark leads are specific and, not coincidentally, aligned with the particular advantages Meta can bring to bear. On CharXiv Reasoning, which tests figure and chart understanding from images, Muse Spark scored 86.4 in Contemplating mode, ahead of both Gemini 3.1 Pro’s 80.2 and GPT-5.4’s 82.8. On HealthBench Hard, a medical reasoning evaluation, Muse Spark scored 42.8%, a figure that reflects the model’s training on data curated in collaboration with more than 1,000 physicians. Claude Opus 4.6 scored 14.8% on the same evaluation; GPT-5.4 scored 40.1%.

Shopping, health, and the ‘personal superintelligence’ thesis

The health benchmark result is not incidental. Meta’s differentiation argument for Muse Spark rests heavily on the model’s ability to combine general reasoning capability with the specific data advantages Meta has over its competitors: three billion users, their interests, their social graphs, and now their health queries. Zuckerberg described Muse Spark as “a world-class assistant and particularly strong in areas related to personal superintelligence like visual understanding, health, social content, shopping, games, and more” in a Facebook post accompanying the release.

A dedicated shopping mode represents the clearest expression of that thesis. The feature draws on content from creators within Meta’s ecosystem alongside signals about individual user interests and behaviour, enabling recommendations that a general-purpose model trained without that context cannot easily replicate. The health capabilities follow the same logic: a model trained with 1,000 physicians on the training team can analyse the nutritional content of a food photo or provide structured guidance on dietary health in ways that general-purpose reasoning does not reliably produce. These are areas where Meta’s platform data is genuinely a competitive advantage rather than a marketing claim.

Muse Spark is currently powering queries in the Meta AI app and Meta.ai website and will expand across Facebook, Instagram, and WhatsApp. Meta has also been building out the MSL team through acquisitions, most recently acquiring Moltbook, an AI agent social network whose co-founders joined MSL directly. The combined effect is a unit that is assembling both the models and the agentic infrastructure to run on top of them.

A closed model from the company that built Llama

The detail that will attract the most scrutiny from Meta’s developer community is the one buried in the product announcement: Muse Spark is closed source. Meta’s Llama series established the template for open-source AI model development through 2025, with successive versions providing the foundation for thousands of applications, research projects, and competing products. Muse Spark breaks that pattern.

Meta has indicated it hopes to release future versions of the model under an open-source licence, framing the current closure as temporary rather than strategic. The more candid reading is that open-source models, however valuable for ecosystem development, sacrifice the competitive advantage that comes from keeping architectural innovations proprietary while rivals are trying to close a capability gap. The pivot to a closed model is a signal that Meta now considers itself in a race it can afford to lose fewer laps of.

The arithmetic of a $14.3 billion bet

The Wang deal was structured as a $14.3 billion investment by Meta for a 49% non-voting stake in Scale AI, with Wang moving to Meta as chief AI officer while remaining on Scale’s board. The capital Meta has been deploying into AI infrastructure extends well beyond that single transaction: the company has guided for between $115 billion and $135 billion in capital expenditure in 2026, up from $72.22 billion in 2025. Muse Spark is the first product-level output from that spending, and it arrives into a market where investors have been watching closely to see whether the investment thesis holds.

Meta’s shares rose approximately 9% on Wednesday, though the move was amplified by a broader market rally following diplomatic developments unrelated to the model launch. The more meaningful signal is in the benchmark table and in the use cases Meta has chosen to lead with: not coding benchmarks where it trails, but health and visual understanding, where it leads. The capital environment underwriting frontier AI development has made scale a necessary but not sufficient condition for competitiveness. Meta’s answer, at least with Muse Spark, is that the sufficient condition is knowing which specific battles to win.

Get the TNW newsletter

Get the most important tech news in your inbox each week.