Built from the same research as Gemini 3, the new family spans a 2B edge model that runs on a Raspberry Pi to a 31B dense model currently ranked third on the Arena AI open-model leaderboard. The Apache 2.0 licence is a significant shift from previous Gemma releases.
Google has released Gemma 4, the latest generation of its open-weight model family, in four sizes designed to cover everything from on-device inference on smartphones to workstation-class deployments.
The models are built from the same research and technology that underpins Gemini 3, Google’s proprietary frontier model, and are released under an Apache 2.0 licence, a more permissive terms than previous Gemma generations, and a change that Hugging Face co-founder Clément Delangue described as “a huge milestone.”
Demis Hassabis, CEO of Google DeepMind, called the new models “the best open models in the world for their respective sizes.”
The four variants are the Effective 2B (E2B) and Effective 4B (E4B) edge models, designed to run on-device on phones, Raspberry Pi, and Jetson Nano hardware developed in collaboration with the Pixel team, Qualcomm, and MediaTek; and the 26B Mixture-of-Experts (MoE) and 31B Dense models, aimed at offline use on developer hardware and consumer GPUs.
The 31B Dense model currently ranks third among all open models on the Arena AI text leaderboard; the 26B MoE sits sixth. Google claims both larger models outcompete models up to 20 times their size on that benchmark.
The 31B’s unquantised weights fit on a single 80GB Nvidia H100 GPU; quantised versions run on consumer hardware.
All four models are multimodal, natively processing video and images, and are trained across more than 140 languages. The E2B and E4B models additionally support native audio input for speech recognition. Context windows are 128K tokens for the edge models and 256K for the two larger variants.
On capability, Google highlights multi-step reasoning improvements, native function-calling and structured JSON output for agentic workflows, and offline code generation. On performance, the Android Developers Blog notes the E2B model runs three times faster than the E4B, while the edge family overall is up to four times faster than previous Gemma versions and uses up to 60% less battery.
The E2B and E4B models are also the foundation for Gemini Nano 4, Google’s next-generation on-device model for Android, which will arrive on consumer devices later this year.
Gemma has accumulated more than 400 million downloads and over 100,000 community-created variants since its first release, a figure Google points to as evidence of developer adoption at scale.
Gemma 4 is available immediately on Hugging Face, Kaggle, and Ollama, with the 31B and 26B models accessible via Google AI Studio and the edge models via AI Edge Gallery.
The Apache 2.0 licensing decision is the most consequential commercial signal in the launch: it removes restrictions that prevented some enterprise and commercial deployments under the previous Gemma terms, opening the ecosystem to a broader range of production use cases.
Get the TNW newsletter
Get the most important tech news in your inbox each week.