Back to News
MiniMax

MiniMax M2.5: The $0.15 Open-Weight Model That's Making Claude Opus Look Like a Luxury Purchase

LLM Rumors··15 min read·
...
MiniMaxM2.5Open-Weight AIHailuo AIChinese AIAI PricingSWE-benchAgentic AI
MiniMax M2.5: The $0.15 Open-Weight Model That's Making Claude Opus Look Like a Luxury Purchase

TL;DR: MiniMax released M2.5 on February 12, 2026, an open-weight coding and agentic model that scores 80.2% on SWE-bench Verified (within 0.6 points of Claude Opus 4.6) while charging just $0.15 per million input tokens, 33x cheaper than Opus[1]. The company IPO'd in Hong Kong a month ago at an $11.5B valuation, shares have since quadrupled, and 30% of all tasks at MiniMax HQ are now completed by their own model[2]. This is the first open-weight model to genuinely match Claude Sonnet-tier performance, and it rewrites the economics of AI development.

The Chinese AI labs have been releasing models at a pace that makes Western product cycles look leisurely, but MiniMax M2.5 is different. It's not incrementally better. It represents a structural break in what open-weight models can achieve, particularly for the agentic coding workflows that are driving the largest share of enterprise AI spend in 2026.

While ByteDance was grabbing headlines with Seedance 2.0 video clips and DeepSeek was teasing V4, MiniMax quietly published benchmark results that made the entire open-source community stop and recalibrate. An open-weight model matching the coding performance of the most expensive frontier models at one-twentieth the cost isn't a minor optimization. It's the kind of shift that forces enterprise procurement teams to rewrite their AI budgets.

BREAKING

Why This Matters Now

MiniMax M2.5 dropped during the Chinese Spring Festival AI blitz of February 2026, exactly one year after the DeepSeek R1 shock. But while DeepSeek proved Chinese labs could match Western reasoning capabilities, M2.5 proves they can match Western agentic coding capabilities, the single highest-value commercial AI use case, at a fraction of the price[1]. The OpenHands evaluation team ranked it the #4 model overall, the first open-weight model to ever exceed Claude Sonnet on their composite benchmark[3].

Developing story

The Numbers That Matter: M2.5 By the Benchmarks

Let's be clear about what MiniMax achieved. This isn't a model that trades well on cherry-picked evaluations. The SWE-bench Verified score of 80.2% puts M2.5 within striking distance of Claude Opus 4.6 (80.8%), a model that costs $5.00 per million input tokens versus M2.5's $0.15[1][4].

By The Numbers

80.2%
SWE-bench Verified

Within 0.6 pts of Claude Opus 4.6

51.3%
Multi-SWE-Bench

First place globally (SOTA)

76.3%
BrowseComp

Industry-leading web search

76.8%
BFCL Tool Calling

Beats Claude 4.6 and Gemini 3 Pro

$0.15/M
Input Price

33x cheaper than Opus 4.6

10B
Active Parameters

Of 230B total (MoE)

LLMRumors.com

What's often overlooked is the Multi-SWE-Bench result. At 51.3%, M2.5 holds the #1 position globally on the multi-language coding benchmark, not just among open-weight models but among all models period[1]. The tool-calling score of 76.8% on BFCL outperforms Claude Opus 4.6, Claude Sonnet 4.5, and Gemini 3 Pro. For agentic workflows that depend on reliable function calling, this isn't a marginal difference.

20x
Cost reduction vs. Claude Opus 4.6 per task
LLMRumors.com

Architecture: 230 Billion Parameters, 10 Billion Active

The uncomfortable truth about why M2.5 is so cheap is also the reason it's so good. MiniMax built on a Mixture-of-Experts architecture with 230 billion total parameters but only 10 billion active per inference pass[5]. This sparse activation means you get the knowledge capacity of a massive model with the compute costs of a much smaller one.

M2.5 Technical Architecture

Sparse MoE

230B total parameters with only 10B active per token, enabling frontier performance at commodity hardware costs

23:1 sparsity ratioEfficient self-hostingModified MIT License

Forge RL Training

Trained with reinforcement learning across 200,000+ real-world environments including codebases, browsers, and office apps

CISPO algorithmTwo-month training periodReal-world task optimization

M2.5-Lightning

Speed-optimized variant that doubles throughput at double the price, still 16x cheaper than Opus

~100 tokens/second$0.30 input / $2.40 outputOptimized for latency-sensitive tasks

200K Context Window

Native 200K token context, practical for most coding and agentic workflows

Full codebase analysisMulti-file refactoringLong document processing
LLMRumors.com

The model was trained using MiniMax's proprietary CISPO algorithm (Clipping Importance Sampling Policy Optimization), first introduced in their M1 paper[5]. What makes M2.5's training unique is the Forge Reinforcement Learning framework: rather than training on synthetic benchmarks, MiniMax trained across 200,000+ real-world environments, actual codebases, web browsers, and office applications[1].

Here's the genius of this approach. Traditional benchmark training optimizes for benchmark performance. Forge RL optimizes for the messy, unpredictable environments where AI agents actually need to work. That's why M2.5's BrowseComp score (76.3%) is so strong: the model was literally trained to navigate real websites, not simulated ones.

The Pricing Bloodbath: Intelligence Too Cheap to Meter

MiniMax isn't being subtle about the economic argument. They're calling it "intelligence too cheap to meter," a deliberate echo of the early nuclear energy promise[6].

FeatureInput $/1MOutput $/1MSWE-bench
MiniMax M2.5$0.15$1.2080.2%
MiniMax M2.5-Lightning$0.30$2.4080.2%
DeepSeek V3.2$0.28$0.42~75%
Claude Sonnet 4.5$3.00$15.00~72%
Claude Opus 4.6$5.00$25.0080.8%
GPT-5.2$1.75$14.00~79%
GPT-5.2 Pro$21.00$168.00~82%
LLMRumors.com

The math is devastating for Western labs' pricing models. Running four M2.5 agents continuously for an entire year costs approximately $10,000[1]. One hour of continuous M2.5-Lightning operation costs roughly $1. For startups and enterprises building agentic AI products, this isn't a price difference. It's the difference between "we can build this" and "we can't afford to build this."

OpenHands Independent Evaluation, February 2026
LLMRumors.com

The Hallucination Problem: What the Self-Reported Benchmarks Don't Tell You

Here's the contrarian take that most coverage of M2.5 is ignoring. Artificial Analysis, the independent AI evaluation firm, ran M2.5 through their AA-Omniscience benchmark and found an 88% hallucination rate, up from M2.1's already concerning 67%[7].

Their Intelligence Index places M2.5 at a score of 42, tied with GLM-4.7 and DeepSeek V3.2 for the #3-5 spots among open-weight models. That's behind Zhipu's GLM-5 (50) and Moonshot's Kimi K2.5 (47)[7].

What this means: M2.5 is genuinely excellent at structured tasks like coding, tool calling, and agentic workflows. But for open-ended knowledge tasks requiring factual accuracy, the model hallucinates significantly more than competitors. This is the classic RL-for-coding tradeoff: heavy reinforcement learning on coding tasks can degrade general knowledge reliability.

WARNING

The Hallucination Caveat

MiniMax's self-reported benchmarks emphasize coding and agentic capabilities where M2.5 genuinely excels. But independent evaluation by Artificial Analysis shows an 88% hallucination rate on their omniscience benchmark[7]. For enterprise deployments requiring factual accuracy (legal analysis, medical information, financial reporting), this gap matters enormously. M2.5 is a coding and agentic powerhouse. It is not a general-purpose knowledge oracle.

MiniMax: From SenseTime Veterans to $11.5 Billion IPO

The company behind M2.5 has one of the most remarkable trajectories in Chinese tech. Founded in December 2021 by Yan Junjie, former VP of SenseTime, MiniMax has moved from stealth to public company in just four years[8].

MiniMax Company Timeline

From SenseTime veterans to $11.5B public company in four years

DateMilestoneSignificance
Dec 2021
Company Founded
Yan Junjie (ex-SenseTime VP) launches MiniMax in Shanghai
2022-23
MiHoYo Backing
Early investment from the Genshin Impact developer
Mar 2024
$600M Round
Led by Alibaba at $2.5B valuation. Tencent, Hillhouse also invest
Sep 2024
Hailuo AI Launch
Video generation goes viral globally, puts MiniMax on the map
Sep 2025
Hollywood Lawsuit
Disney, Universal, and Warner Bros. file copyright suit in U.S. federal court
Oct 2025
M2 Model Launch
230B total / 10B active parameters. MoE architecture at commodity pricing
Jan 2026
Hong Kong IPO
Raises HK$4.8B ($620M), shares surge 110% on debut to $11.5B valuation
Feb 2026
M2.5 Released
80.2% SWE-bench, $0.15/M tokens. Shares jump 15.7% to HK$680
LLMRumors.com

The investor list reads like a who's who of Chinese tech: Alibaba, Tencent, Hillhouse Investment, HongShan (formerly Sequoia China), and IDG Capital[8]. Revenue hit $53.4 million in the nine months ending September 2025, up 174% year-over-year, though the company posted a $512 million net loss over the same period[9].

By The Numbers

$11.5B
IPO Valuation

Hong Kong Stock Exchange, Jan 2026

174%
Revenue Growth

YoY, nine months ending Sep 2025

200M+
Global Users

Across Hailuo AI, Talkie, and platform

$512M
Net Loss

Nine months ending Sep 2025

LLMRumors.com

The Consumer Empire: Hailuo AI and Talkie

What separates MiniMax from many Chinese AI startups is the consumer distribution. With 200+ million cumulative users across 200+ countries, MiniMax has built a consumer flywheel that most AI labs can only dream of[10].

MiniMax Product Portfolio

Hailuo AI

Consumer multimodal platform with video generation (Hailuo 02/2.3), text, and music creation

Text-to-video generationViral social media contentGlobal availability

Talkie

AI companion and chatbot app with character personalities and entertainment focus

AI character conversationsCelebrity personalities200M+ users across products

Speech-02

Text-to-speech model supporting 30+ languages with exceptionally long input processing

Multilingual voice synthesisLong-form narrationCharacter voice acting

MiniMax Open Platform

Enterprise and developer API at platform.minimax.io with pay-as-you-go pricing

M2.5 and M2.5-Lightning APIsVideo and speech APIsModified MIT License
LLMRumors.com

The video generation side of the business is what initially put MiniMax on the global radar, but it also brought legal trouble. In September 2025, Disney, Universal, and Warner Bros. (plus Marvel, Lucasfilm, DC Comics, and others) filed a copyright lawsuit in U.S. federal court alleging that Hailuo AI generates copyrighted characters on demand[11]. The lawsuit is ongoing.

The Spring Festival AI War: M2.5 in Context

M2.5 didn't launch in a vacuum. It dropped during what Chinese tech media is calling the "Spring Festival AI War," a concentrated burst of model releases that coincided with Lunar New Year 2026[12].

Chinese AI Lab Landscape (February 2026)

DeepSeek

V4 imminent. Started the price war with R1 in January 2025. Now at $0.28/$0.42 per 1M tokens. Expanded to 1M token context.

+Price war originator
+$0.28/$0.42 per 1M tokens
+1M token context window

Zhipu (Z.ai) / GLM-5

GLM-5 leads Artificial Analysis Intelligence Index at score 50. IPO'd alongside MiniMax. Strongest on general intelligence benchmarks.

+Intelligence Index #1 (score 50)
+Hong Kong IPO
+General intelligence leader

Alibaba / Qwen

Qwen 3.5 in preparation. Spending CNY 3B ($434M) on user acquisition. Qwen overtook Meta's Llama in cumulative downloads.

+$434M user acquisition spend
+Overtook Llama in downloads
+Qwen 3.5 in development

ByteDance / Seed2.0

Released Seedance 2.0 for video, full-stack AI ecosystem across LLMs, vision, and video at aggressive pricing.

+Full-stack AI ecosystem
+$0.47/M input tokens for Pro
+Cinema-grade video generation

Moonshot / Kimi K2.5

K2.5 ranks #2 among open weights on Intelligence Index (score 47). Strong math with 96.1% on AIME 2025.

+Intelligence Index #2 (score 47)
+96.1% on AIME 2025
+Math reasoning specialist
LLMRumors.com

What makes MiniMax's position unique in this crowded field is the cost-performance niche. GLM-5 leads on raw general intelligence. Kimi K2.5 leads on math. But M2.5 leads on the metric that matters most to enterprise buyers: coding and agentic performance per dollar spent[3].

What This Means: The Open-Weight Tipping Point

The uncomfortable truth for Western AI labs is that M2.5 represents a tipping point for open-weight models. When an open-weight model can match 99.3% of the top proprietary model's coding performance at 3% of the cost, the value proposition of closed-source APIs becomes much harder to justify for pure coding and agentic workloads.

Key Takeaways for AI Teams

1.

M2.5 is a coding and agentic specialist, not a general-purpose model

The 88% hallucination rate on Artificial Analysis benchmarks means M2.5 should not replace general-purpose models for knowledge-intensive tasks. Use it for what it's best at: code generation, tool calling, and agentic workflows.

Tip:Run M2.5 for coding and tool-calling tasks, keep a general-purpose model for knowledge-intensive queries
2.

The price point enables entirely new architectures

At $0.15/M input tokens, you can run multi-agent systems with dozens of M2.5 instances for less than the cost of a single Claude Opus call. This changes what's architecturally possible.

Tip:Prototype multi-agent systems on M2.5 first, then evaluate whether you need a more expensive model
3.

Self-hosting is genuinely viable

With only 10B active parameters on a 230B MoE architecture and a modified MIT license, organizations can run M2.5 on-premises at a fraction of the cost of API access to Western frontier models.

Tip:Check the modified MIT license requirements: commercial use requires 'MiniMax M2.5' attribution
4.

Watch the copyright litigation

The Disney/Universal/Warner lawsuit against MiniMax could set precedent for all AI-generated content. Enterprise users should monitor this closely before building production workflows on MiniMax products.

Tip:Consult legal counsel before deploying MiniMax models for content generation in regulated industries
5.

Validate independently before deploying

MiniMax's self-reported benchmarks diverge from independent evaluations. Run your own evaluation suite on your specific use case before committing to M2.5 in production.

Tip:Use Artificial Analysis and OpenHands independent benchmarks as your baseline, not MiniMax's self-reported numbers
LLMRumors.com

While competitors were chasing GPT-5 on general intelligence, MiniMax spent two months training M2.5 in 200,000+ real-world environments to be the model that actually does the work. At a price point that makes every other frontier model look like a luxury purchase.

WARNING

The Bottom Line

MiniMax M2.5 isn't trying to be the smartest model. It's trying to be the most useful model at the lowest price. And for the agentic coding workflows that dominate enterprise AI spending in 2026, it's succeeding. The question isn't whether open-weight models can match proprietary ones on coding tasks. M2.5 just proved they can. The question is how Western labs respond when their pricing moat evaporates overnight.

Sources & References

Key sources and references used in this article

#SourceOutletDateKey Takeaway
1
MiniMax M2.5 Official Announcement
2
VentureBeat: MiniMax M2.5 near state-of-the-art at 1/20th the cost
3
OpenHands: Open-weight models catch up to Claude Sonnet
4
Anthropic: Claude Opus 4.6 Announcement
5
MiniMax-M1 Paper on arXiv
6
The Decoder: Intelligence too cheap to meter
7
Artificial Analysis: M2.5 Everything You Need to Know
8
CNBC: MiniMax doubles in Hong Kong debut
9
TechNode: MiniMax IPO surges 110%
10
MiniMax Wikipedia
11
Variety: Disney/Warner/NBCU sue MiniMax
12
CNBC: China AI Lunar New Year war
13
MiniMax Platform Pricing
14
MIT Technology Review: What's next for Chinese open-source AI
14 sourcesClick any row to visit original

Last updated: February 12, 2026