TL;DR: MiniMax released M2.5 on February 12, 2026, an open-weight coding and agentic model that scores 80.2% on SWE-bench Verified (within 0.6 points of Claude Opus 4.6) while charging just $0.15 per million input tokens, 33x cheaper than Opus[1]. The company IPO'd in Hong Kong a month ago at an $11.5B valuation, shares have since quadrupled, and 30% of all tasks at MiniMax HQ are now completed by their own model[2]. This is the first open-weight model to genuinely match Claude Sonnet-tier performance, and it rewrites the economics of AI development.
The Chinese AI labs have been releasing models at a pace that makes Western product cycles look leisurely, but MiniMax M2.5 is different. It's not incrementally better. It represents a structural break in what open-weight models can achieve, particularly for the agentic coding workflows that are driving the largest share of enterprise AI spend in 2026.
While ByteDance was grabbing headlines with Seedance 2.0 video clips and DeepSeek was teasing V4, MiniMax quietly published benchmark results that made the entire open-source community stop and recalibrate. An open-weight model matching the coding performance of the most expensive frontier models at one-twentieth the cost isn't a minor optimization. It's the kind of shift that forces enterprise procurement teams to rewrite their AI budgets.
Why This Matters Now
MiniMax M2.5 dropped during the Chinese Spring Festival AI blitz of February 2026, exactly one year after the DeepSeek R1 shock. But while DeepSeek proved Chinese labs could match Western reasoning capabilities, M2.5 proves they can match Western agentic coding capabilities, the single highest-value commercial AI use case, at a fraction of the price[1]. The OpenHands evaluation team ranked it the #4 model overall, the first open-weight model to ever exceed Claude Sonnet on their composite benchmark[3].
The Numbers That Matter: M2.5 By the Benchmarks
Let's be clear about what MiniMax achieved. This isn't a model that trades well on cherry-picked evaluations. The SWE-bench Verified score of 80.2% puts M2.5 within striking distance of Claude Opus 4.6 (80.8%), a model that costs $5.00 per million input tokens versus M2.5's $0.15[1][4].
By The Numbers
Within 0.6 pts of Claude Opus 4.6
First place globally (SOTA)
Industry-leading web search
Beats Claude 4.6 and Gemini 3 Pro
33x cheaper than Opus 4.6
Of 230B total (MoE)
What's often overlooked is the Multi-SWE-Bench result. At 51.3%, M2.5 holds the #1 position globally on the multi-language coding benchmark, not just among open-weight models but among all models period[1]. The tool-calling score of 76.8% on BFCL outperforms Claude Opus 4.6, Claude Sonnet 4.5, and Gemini 3 Pro. For agentic workflows that depend on reliable function calling, this isn't a marginal difference.
Architecture: 230 Billion Parameters, 10 Billion Active
The uncomfortable truth about why M2.5 is so cheap is also the reason it's so good. MiniMax built on a Mixture-of-Experts architecture with 230 billion total parameters but only 10 billion active per inference pass[5]. This sparse activation means you get the knowledge capacity of a massive model with the compute costs of a much smaller one.
M2.5 Technical Architecture
Sparse MoE
230B total parameters with only 10B active per token, enabling frontier performance at commodity hardware costs
Forge RL Training
Trained with reinforcement learning across 200,000+ real-world environments including codebases, browsers, and office apps
M2.5-Lightning
Speed-optimized variant that doubles throughput at double the price, still 16x cheaper than Opus
200K Context Window
Native 200K token context, practical for most coding and agentic workflows
The model was trained using MiniMax's proprietary CISPO algorithm (Clipping Importance Sampling Policy Optimization), first introduced in their M1 paper[5]. What makes M2.5's training unique is the Forge Reinforcement Learning framework: rather than training on synthetic benchmarks, MiniMax trained across 200,000+ real-world environments, actual codebases, web browsers, and office applications[1].
Here's the genius of this approach. Traditional benchmark training optimizes for benchmark performance. Forge RL optimizes for the messy, unpredictable environments where AI agents actually need to work. That's why M2.5's BrowseComp score (76.3%) is so strong: the model was literally trained to navigate real websites, not simulated ones.
The Pricing Bloodbath: Intelligence Too Cheap to Meter
MiniMax isn't being subtle about the economic argument. They're calling it "intelligence too cheap to meter," a deliberate echo of the early nuclear energy promise[6].
| Feature | Input $/1M | Output $/1M | SWE-bench |
|---|---|---|---|
| MiniMax M2.5 | $0.15 | $1.20 | 80.2% |
| MiniMax M2.5-Lightning | $0.30 | $2.40 | 80.2% |
| DeepSeek V3.2 | $0.28 | $0.42 | ~75% |
| Claude Sonnet 4.5 | $3.00 | $15.00 | ~72% |
| Claude Opus 4.6 | $5.00 | $25.00 | 80.8% |
| GPT-5.2 | $1.75 | $14.00 | ~79% |
| GPT-5.2 Pro | $21.00 | $168.00 | ~82% |
The math is devastating for Western labs' pricing models. Running four M2.5 agents continuously for an entire year costs approximately $10,000[1]. One hour of continuous M2.5-Lightning operation costs roughly $1. For startups and enterprises building agentic AI products, this isn't a price difference. It's the difference between "we can build this" and "we can't afford to build this."
The Hallucination Problem: What the Self-Reported Benchmarks Don't Tell You
Here's the contrarian take that most coverage of M2.5 is ignoring. Artificial Analysis, the independent AI evaluation firm, ran M2.5 through their AA-Omniscience benchmark and found an 88% hallucination rate, up from M2.1's already concerning 67%[7].
Their Intelligence Index places M2.5 at a score of 42, tied with GLM-4.7 and DeepSeek V3.2 for the #3-5 spots among open-weight models. That's behind Zhipu's GLM-5 (50) and Moonshot's Kimi K2.5 (47)[7].
What this means: M2.5 is genuinely excellent at structured tasks like coding, tool calling, and agentic workflows. But for open-ended knowledge tasks requiring factual accuracy, the model hallucinates significantly more than competitors. This is the classic RL-for-coding tradeoff: heavy reinforcement learning on coding tasks can degrade general knowledge reliability.
The Hallucination Caveat
MiniMax's self-reported benchmarks emphasize coding and agentic capabilities where M2.5 genuinely excels. But independent evaluation by Artificial Analysis shows an 88% hallucination rate on their omniscience benchmark[7]. For enterprise deployments requiring factual accuracy (legal analysis, medical information, financial reporting), this gap matters enormously. M2.5 is a coding and agentic powerhouse. It is not a general-purpose knowledge oracle.
MiniMax: From SenseTime Veterans to $11.5 Billion IPO
The company behind M2.5 has one of the most remarkable trajectories in Chinese tech. Founded in December 2021 by Yan Junjie, former VP of SenseTime, MiniMax has moved from stealth to public company in just four years[8].
MiniMax Company Timeline
From SenseTime veterans to $11.5B public company in four years
| Date | Milestone | Significance |
|---|---|---|
| Dec 2021 | Company Founded | Yan Junjie (ex-SenseTime VP) launches MiniMax in Shanghai |
| 2022-23 | MiHoYo Backing | Early investment from the Genshin Impact developer |
| Mar 2024 | $600M Round | Led by Alibaba at $2.5B valuation. Tencent, Hillhouse also invest |
| Sep 2024 | Hailuo AI Launch | Video generation goes viral globally, puts MiniMax on the map |
| Sep 2025 | Hollywood Lawsuit | Disney, Universal, and Warner Bros. file copyright suit in U.S. federal court |
| Oct 2025 | M2 Model Launch | 230B total / 10B active parameters. MoE architecture at commodity pricing |
| Jan 2026 | Hong Kong IPO | Raises HK$4.8B ($620M), shares surge 110% on debut to $11.5B valuation |
| Feb 2026 | M2.5 Released | 80.2% SWE-bench, $0.15/M tokens. Shares jump 15.7% to HK$680 |
The investor list reads like a who's who of Chinese tech: Alibaba, Tencent, Hillhouse Investment, HongShan (formerly Sequoia China), and IDG Capital[8]. Revenue hit $53.4 million in the nine months ending September 2025, up 174% year-over-year, though the company posted a $512 million net loss over the same period[9].
By The Numbers
Hong Kong Stock Exchange, Jan 2026
YoY, nine months ending Sep 2025
Across Hailuo AI, Talkie, and platform
Nine months ending Sep 2025
The Consumer Empire: Hailuo AI and Talkie
What separates MiniMax from many Chinese AI startups is the consumer distribution. With 200+ million cumulative users across 200+ countries, MiniMax has built a consumer flywheel that most AI labs can only dream of[10].
MiniMax Product Portfolio
Hailuo AI
Consumer multimodal platform with video generation (Hailuo 02/2.3), text, and music creation
Talkie
AI companion and chatbot app with character personalities and entertainment focus
Speech-02
Text-to-speech model supporting 30+ languages with exceptionally long input processing
MiniMax Open Platform
Enterprise and developer API at platform.minimax.io with pay-as-you-go pricing
The video generation side of the business is what initially put MiniMax on the global radar, but it also brought legal trouble. In September 2025, Disney, Universal, and Warner Bros. (plus Marvel, Lucasfilm, DC Comics, and others) filed a copyright lawsuit in U.S. federal court alleging that Hailuo AI generates copyrighted characters on demand[11]. The lawsuit is ongoing.
The Spring Festival AI War: M2.5 in Context
M2.5 didn't launch in a vacuum. It dropped during what Chinese tech media is calling the "Spring Festival AI War," a concentrated burst of model releases that coincided with Lunar New Year 2026[12].
Chinese AI Lab Landscape (February 2026)
DeepSeek
V4 imminent. Started the price war with R1 in January 2025. Now at $0.28/$0.42 per 1M tokens. Expanded to 1M token context.
Zhipu (Z.ai) / GLM-5
GLM-5 leads Artificial Analysis Intelligence Index at score 50. IPO'd alongside MiniMax. Strongest on general intelligence benchmarks.
Alibaba / Qwen
Qwen 3.5 in preparation. Spending CNY 3B ($434M) on user acquisition. Qwen overtook Meta's Llama in cumulative downloads.
ByteDance / Seed2.0
Released Seedance 2.0 for video, full-stack AI ecosystem across LLMs, vision, and video at aggressive pricing.
Moonshot / Kimi K2.5
K2.5 ranks #2 among open weights on Intelligence Index (score 47). Strong math with 96.1% on AIME 2025.
What makes MiniMax's position unique in this crowded field is the cost-performance niche. GLM-5 leads on raw general intelligence. Kimi K2.5 leads on math. But M2.5 leads on the metric that matters most to enterprise buyers: coding and agentic performance per dollar spent[3].
What This Means: The Open-Weight Tipping Point
The uncomfortable truth for Western AI labs is that M2.5 represents a tipping point for open-weight models. When an open-weight model can match 99.3% of the top proprietary model's coding performance at 3% of the cost, the value proposition of closed-source APIs becomes much harder to justify for pure coding and agentic workloads.
Key Takeaways for AI Teams
M2.5 is a coding and agentic specialist, not a general-purpose model
The 88% hallucination rate on Artificial Analysis benchmarks means M2.5 should not replace general-purpose models for knowledge-intensive tasks. Use it for what it's best at: code generation, tool calling, and agentic workflows.
The price point enables entirely new architectures
At $0.15/M input tokens, you can run multi-agent systems with dozens of M2.5 instances for less than the cost of a single Claude Opus call. This changes what's architecturally possible.
Self-hosting is genuinely viable
With only 10B active parameters on a 230B MoE architecture and a modified MIT license, organizations can run M2.5 on-premises at a fraction of the cost of API access to Western frontier models.
Watch the copyright litigation
The Disney/Universal/Warner lawsuit against MiniMax could set precedent for all AI-generated content. Enterprise users should monitor this closely before building production workflows on MiniMax products.
Validate independently before deploying
MiniMax's self-reported benchmarks diverge from independent evaluations. Run your own evaluation suite on your specific use case before committing to M2.5 in production.
While competitors were chasing GPT-5 on general intelligence, MiniMax spent two months training M2.5 in 200,000+ real-world environments to be the model that actually does the work. At a price point that makes every other frontier model look like a luxury purchase.
The Bottom Line
MiniMax M2.5 isn't trying to be the smartest model. It's trying to be the most useful model at the lowest price. And for the agentic coding workflows that dominate enterprise AI spending in 2026, it's succeeding. The question isn't whether open-weight models can match proprietary ones on coding tasks. M2.5 just proved they can. The question is how Western labs respond when their pricing moat evaporates overnight.
Sources & References
Key sources and references used in this article
| # | Source | Outlet | Date | Key Takeaway |
|---|---|---|---|---|
| 1 | MiniMax M2.5 Official Announcement | |||
| 2 | VentureBeat: MiniMax M2.5 near state-of-the-art at 1/20th the cost | |||
| 3 | OpenHands: Open-weight models catch up to Claude Sonnet | |||
| 4 | Anthropic: Claude Opus 4.6 Announcement | |||
| 5 | MiniMax-M1 Paper on arXiv | |||
| 6 | The Decoder: Intelligence too cheap to meter | |||
| 7 | Artificial Analysis: M2.5 Everything You Need to Know | |||
| 8 | CNBC: MiniMax doubles in Hong Kong debut | |||
| 9 | TechNode: MiniMax IPO surges 110% | |||
| 10 | MiniMax Wikipedia | |||
| 11 | Variety: Disney/Warner/NBCU sue MiniMax | |||
| 12 | CNBC: China AI Lunar New Year war | |||
| 13 | MiniMax Platform Pricing | |||
| 14 | MIT Technology Review: What's next for Chinese open-source AI |
Last updated: February 12, 2026




