# MiniMax M2.5: Frontier AI at 20x Less Than Claude Opus

**Plutonous** | February 12, 2026 | 14 min read


Tags: MiniMax, M2.5, Open-Weight AI, Hailuo AI, Chinese AI, AI Pricing, SWE-bench, Agentic AI

---

**TL;DR:** MiniMax released M2.5 on February 12, 2026, an open-weight coding and agentic model that scores 80.2% on SWE-bench Verified (within 0.6 points of Claude Opus 4.6) while charging just $0.15 per million input tokens, 33x cheaper than Opus<sup><a href="#source-1">[1]</a></sup>. The company IPO'd in Hong Kong a month ago at an $11.5B valuation, shares have since quadrupled, and 30% of all tasks at MiniMax HQ are now completed by their own model<sup><a href="#source-2">[2]</a></sup>. This is the first open-weight model to genuinely match Claude Sonnet-tier performance, and it rewrites the economics of AI development.

The Chinese AI labs have been releasing models at a pace that makes Western product cycles look leisurely, but MiniMax M2.5 is different. It's not incrementally better. It represents a structural break in what open-weight models can achieve, particularly for the agentic coding workflows that are driving the largest share of enterprise AI spend in 2026.

While ByteDance was grabbing headlines with Seedance 2.0 video clips and DeepSeek was teasing V4, MiniMax quietly published benchmark results that made the entire open-source community stop and recalibrate. An open-weight model matching the coding performance of the most expensive frontier models at one-twentieth the cost isn't a minor optimization. It's the kind of shift that forces enterprise procurement teams to rewrite their AI budgets.

> **Why This Matters Now**
>
> MiniMax M2.5 dropped during the Chinese Spring Festival AI blitz of February 2026, exactly one year after the DeepSeek R1 shock. But while DeepSeek proved Chinese labs could match Western reasoning capabilities, M2.5 proves they can match Western *agentic coding* capabilities, the single highest-value commercial AI use case, at a fraction of the price[1]. The OpenHands evaluation team ranked it the #4 model overall, the first open-weight model to ever exceed Claude Sonnet on their composite benchmark[3].


## The Numbers That Matter: M2.5 By the Benchmarks

Let's be clear about what MiniMax achieved. This isn't a model that trades well on cherry-picked evaluations. The SWE-bench Verified score of 80.2% puts M2.5 within striking distance of Claude Opus 4.6 (80.8%), a model that costs $5.00 per million input tokens versus M2.5's $0.15<sup><a href="#source-1">[1]</a></sup><sup><a href="#source-4">[4]</a></sup>.

- **80.2%**: SWE-bench Verified
- **51.3%**: Multi-SWE-Bench
- **76.3%**: BrowseComp
- **76.8%**: BFCL Tool Calling
- **$0.15/M**: Input Price
- **10B**: Active Parameters


What's often overlooked is the Multi-SWE-Bench result. At 51.3%, M2.5 holds the #1 position globally on the multi-language coding benchmark, not just among open-weight models but among all models period<sup><a href="#source-1">[1]</a></sup>. The tool-calling score of 76.8% on BFCL outperforms Claude Opus 4.6, Claude Sonnet 4.5, and Gemini 3 Pro. For agentic workflows that depend on reliable function calling, this isn't a marginal difference.

**20x** — Cost reduction vs. Claude Opus 4.6 per task


## Architecture: 230 Billion Parameters, 10 Billion Active

The uncomfortable truth about why M2.5 is so cheap is also the reason it's so good. MiniMax built on a Mixture-of-Experts architecture with 230 billion total parameters but only 10 billion active per inference pass<sup><a href="#source-5">[5]</a></sup>. This sparse activation means you get the knowledge capacity of a massive model with the compute costs of a much smaller one.


The model was trained using MiniMax's proprietary CISPO algorithm (Clipping Importance Sampling Policy Optimization), first introduced in their M1 paper<sup><a href="#source-5">[5]</a></sup>. What makes M2.5's training unique is the Forge Reinforcement Learning framework: rather than training on synthetic benchmarks, MiniMax trained across 200,000+ real-world environments, actual codebases, web browsers, and office applications<sup><a href="#source-1">[1]</a></sup>.

Here's the genius of this approach. Traditional benchmark training optimizes for benchmark performance. Forge RL optimizes for the messy, unpredictable environments where AI agents actually need to work. That's why M2.5's BrowseComp score (76.3%) is so strong: the model was literally trained to navigate real websites, not simulated ones.

## The Pricing Bloodbath: Intelligence Too Cheap to Meter

MiniMax isn't being subtle about the economic argument. They're calling it "intelligence too cheap to meter," a deliberate echo of the early nuclear energy promise<sup><a href="#source-6">[6]</a></sup>.


The math is devastating for Western labs' pricing models. Running four M2.5 agents continuously for an entire year costs approximately $10,000<sup><a href="#source-1">[1]</a></sup>. One hour of continuous M2.5-Lightning operation costs roughly $1. For startups and enterprises building agentic AI products, this isn't a price difference. It's the difference between "we can build this" and "we can't afford to build this."


A two-horse race between Claude Opus on the most capable but pricy side, and M2.5 on the very inexpensive and still highly capable side.


## The Hallucination Problem: What the Self-Reported Benchmarks Don't Tell You

Here's the contrarian take that most coverage of M2.5 is ignoring. Artificial Analysis, the independent AI evaluation firm, ran M2.5 through their AA-Omniscience benchmark and found an 88% hallucination rate, up from M2.1's already concerning 67%<sup><a href="#source-7">[7]</a></sup>.

Their Intelligence Index places M2.5 at a score of 42, tied with GLM-4.7 and DeepSeek V3.2 for the #3-5 spots among open-weight models. That's behind Zhipu's GLM-5 (50) and Moonshot's Kimi K2.5 (47)<sup><a href="#source-7">[7]</a></sup>.

What this means: M2.5 is genuinely excellent at structured tasks like coding, tool calling, and agentic workflows. But for open-ended knowledge tasks requiring factual accuracy, the model hallucinates significantly more than competitors. This is the classic RL-for-coding tradeoff: heavy reinforcement learning on coding tasks can degrade general knowledge reliability.

> **The Hallucination Caveat**
>
> MiniMax's self-reported benchmarks emphasize coding and agentic capabilities where M2.5 genuinely excels. But independent evaluation by Artificial Analysis shows an 88% hallucination rate on their omniscience benchmark[7]. For enterprise deployments requiring factual accuracy (legal analysis, medical information, financial reporting), this gap matters enormously. M2.5 is a coding and agentic powerhouse. It is not a general-purpose knowledge oracle.


## MiniMax: From SenseTime Veterans to $11.5 Billion IPO

The company behind M2.5 has one of the most remarkable trajectories in Chinese tech. Founded in December 2021 by Yan Junjie, former VP of SenseTime, MiniMax has moved from stealth to public company in just four years<sup><a href="#source-8">[8]</a></sup>.


The investor list reads like a who's who of Chinese tech: Alibaba, Tencent, Hillhouse Investment, HongShan (formerly Sequoia China), and IDG Capital<sup><a href="#source-8">[8]</a></sup>. Revenue hit $53.4 million in the nine months ending September 2025, up 174% year-over-year, though the company posted a $512 million net loss over the same period<sup><a href="#source-9">[9]</a></sup>.

- **$11.5B**: IPO Valuation
- **174%**: Revenue Growth
- **200M+**: Global Users
- **$512M**: Net Loss


## The Consumer Empire: Hailuo AI and Talkie

What separates MiniMax from many Chinese AI startups is the consumer distribution. With 200+ million cumulative users across 200+ countries, MiniMax has built a consumer flywheel that most AI labs can only dream of<sup><a href="#source-10">[10]</a></sup>.


The video generation side of the business is what initially put MiniMax on the global radar, but it also brought legal trouble. In September 2025, Disney, Universal, and Warner Bros. (plus Marvel, Lucasfilm, DC Comics, and others) filed a copyright lawsuit in U.S. federal court alleging that Hailuo AI generates copyrighted characters on demand<sup><a href="#source-11">[11]</a></sup>. The lawsuit is ongoing.

## The Spring Festival AI War: M2.5 in Context

M2.5 didn't launch in a vacuum. It dropped during what Chinese tech media is calling the "Spring Festival AI War," a concentrated burst of model releases that coincided with Lunar New Year 2026<sup><a href="#source-12">[12]</a></sup>.


What makes MiniMax's position unique in this crowded field is the cost-performance niche. GLM-5 leads on raw general intelligence. Kimi K2.5 leads on math. But M2.5 leads on the metric that matters most to enterprise buyers: coding and agentic performance per dollar spent<sup><a href="#source-3">[3]</a></sup>.

## What This Means: The Open-Weight Tipping Point

The uncomfortable truth for Western AI labs is that M2.5 represents a tipping point for open-weight models. When an open-weight model can match 99.3% of the top proprietary model's coding performance at 3% of the cost, the value proposition of closed-source APIs becomes much harder to justify for pure coding and agentic workloads.


While competitors were chasing GPT-5 on general intelligence, MiniMax spent two months training M2.5 in 200,000+ real-world environments to be the model that actually does the work. At a price point that makes every other frontier model look like a luxury purchase.

> **The Bottom Line**
>
> MiniMax M2.5 isn't trying to be the smartest model. It's trying to be the most useful model at the lowest price. And for the agentic coding workflows that dominate enterprise AI spending in 2026, it's succeeding. The question isn't whether open-weight models can match proprietary ones on coding tasks. M2.5 just proved they can. The question is how Western labs respond when their pricing moat evaporates overnight.


*Last updated: February 12, 2026*

---

*Source: [LLM Rumors](https://www.llmrumors.com/news/minimax-m25-cheapest-frontier-model)*