Qwen3.5: The Model Anthropic Didn't Name

TL;DR: On February 16, 2026, Alibaba released Qwen3.5-397B-A17B: a 397B-parameter MoE model that activates only 17B per forward pass, scores 88.4 on GPQA Diamond, and runs at $0.60 per million input tokens, roughly 8x cheaper than Claude Opus 4.6.^[1] Seven days earlier, Anthropic published a landmark distillation attack report naming three Chinese AI labs for industrial-scale theft.^[2] Alibaba was not among them. That absence is not a coincidence. It is the entire story.

On February 23, 2026, Anthropic named three Chinese AI labs (DeepSeek, Moonshot AI, and MiniMax) for running coordinated campaigns to extract Claude's capabilities through 16 million fraudulent API exchanges.^[2] The industry parsed the names that were included. Almost nobody asked about the names that were left out.

Alibaba was not named. ByteDance was not named. Baidu was not named. Tencent was not named.

These are not small actors. Together they control more users, more compute, more revenue, and more AI deployment than the three companies Anthropic did name. And they are conspicuously, structurally, predictably absent from a report about labs that needed to steal training data because they could not generate it themselves.

Qwen3.5 is the proof of concept. Frontier-tier benchmarks. Open-source Apache 2.0 licensing. Eight times cheaper than Claude at the API level. Built by a company that processes more commercial transactions annually than Amazon, eBay, and Etsy combined. They did not need to distill from anyone. They could not afford the reputational risk even if they had wanted to. And most importantly: they had better options.

BREAKING

The Absent Name

Anthropic's distillation report named every major pure-play Chinese AI startup. It named none of the Chinese Big Tech AI divisions. Alibaba, ByteDance, Baidu, Tencent, and Xiaomi, all running frontier AI programs, are absent. The companies that were caught are all data-poor relative to their frontier ambitions. The companies that were not caught are data-rich by construction. This is not a coincidence. It is the operating logic of the distillation problem.

Developing story

What Qwen3.5 Actually Is

Before the strategic analysis, the technical reality deserves attention, because Qwen3.5 is genuinely impressive in ways that get buried under the geopolitical noise.

Released on February 16, 2026, Qwen3.5-397B-A17B is the first open-weight model in the new Qwen3.5 series.^[1] It is a native vision-language model built on a hybrid architecture that fuses linear attention via Gated Delta Networks with a sparse mixture-of-experts design. The architecture matters because it achieves something that was considered difficult eighteen months ago: 397 billion total parameters with only 17 billion activated per forward pass. That is a 95% reduction in active compute relative to total capacity without proportional capability loss.

This is not a capability demo. Qwen3.5 is already deployed across Alibaba's product suite. The model supports 201 languages and dialects, up from 119 in the previous generation, reflecting Alibaba's global commercial footprint. The hosted Qwen3.5-Plus version includes a default 1 million token context window and built-in tool use with adaptive agent capabilities.^[3]

The architecture is genuinely novel. Most frontier models bolt on vision as a second stage. Qwen3.5 processes text, images up to 1344×1344 resolution, and 60-second video clips from the first pretraining stage. The multimodal capability is architectural, not cosmetic.

The Benchmarks: Where Qwen3.5 Actually Sits

Self-reported benchmarks from Chinese AI labs require caveat. Alibaba claims Qwen3.5 outperforms GPT-5.2, Claude Opus 4.6, and Gemini 3 Pro on roughly 80% of evaluated benchmark categories.^[3] CNBC noted that CNBC could not independently verify those claims.^[3] That is the standard honest caveat.

Here is what independent leaderboards show, which is less dramatic but still telling.

On the GPQA leaderboard (the most reliably independently verified benchmark), Qwen3.5-397B-A17B at 88.4% sits at rank 9 globally, tied with Grok-4 Heavy, below Claude Opus 4.6 (91.3%) and GPT-5.2 (92.4%) but ahead of GPT-5.1, ChatGPT-4o, DeepSeek V3.2, and every other open-weight model.^[4] That is solidly frontier-tier, not frontier-adjacent.

The 35B-A3B comparison is the more interesting efficiency story. At only 3B active parameters, Qwen3.5-35B-A3B scores 84.2 on GPQA Diamond versus Claude Sonnet 4.5's 83.4. It scores 85.3 on MMLU-Pro versus Sonnet 4.5's 85.0. Those are effectively equal on general reasoning at roughly 25x lower active compute, and at a fraction of the API cost ($0.12/M input vs $3.00/M for Sonnet 4.5).^[14] The 35B trails on coding: Sonnet 4.5 scores 77.2 on SWE-bench Verified versus the 35B's 69.2. But for reasoning-heavy workloads, the efficiency gap is striking.

The pricing context makes the flagship benchmark positioning sharper too. At $0.60/M input, Qwen3.5-397B-A17B is 8.3x cheaper than Claude Opus 4.6 for roughly 3 percentage points less GPQA performance. For most enterprise deployments, that trade-off is not a trade-off at all. It is an obvious operational decision. The open-weight version under Apache 2.0 makes the self-hosted case even more compelling: no API billing at all.

The Missing Names in Anthropic's Report

Let's be clear about what Anthropic's February 23 distillation report actually tells us about the structure of the Chinese AI ecosystem.

The three named labs, DeepSeek, Moonshot AI (Kimi), and MiniMax, share a defining characteristic: they are pure-play AI research companies.^[2] They were founded with the explicit goal of building frontier AI models. They do not operate e-commerce platforms, social media networks, enterprise software ecosystems, or payment systems. They generate revenue through AI products and API access. Their training data comes from the open web, public datasets, licensed sources, and, as Anthropic documented, from fraudulently extracted competitor outputs.

The companies not named, including Alibaba, ByteDance, Baidu, and Tencent, share a different defining characteristic. They are some of the largest tech companies on Earth, and they generate proprietary training data as a byproduct of their core businesses at scales that are simply unavailable to a pure-play AI startup.

This is not speculation. It is structural.

Why Some Labs Needed to Distill and Others Didn't

Feature	Named in Distillation Report	Proprietary Data Ecosystem
DeepSeek	Yes: 150K extractions, chain-of-thought targeting	Pure-play AI research lab. Web data only. No proprietary product data.
Moonshot AI (Kimi)	Yes: 3.4M exchanges, senior staff directly involved	Pure-play AI startup. Consumer chatbot. No diversified data moat.
MiniMax	Yes: 13M+ exchanges, largest single campaign	Pure-play AI startup. Consumer products. No enterprise data ecosystem.
Alibaba / Qwen	Not named	Taobao/Tmall (world's largest e-commerce), Alibaba Cloud, Ant Financial, DingTalk enterprise comms, Youku video. Billions of proprietary transactions.
ByteDance / Seed2.0	Not named	TikTok/Douyin (1B+ users, world's largest behavioral ranking dataset), Toutiao news, Lark enterprise. Unmatched behavioral signal.
Baidu / ERNIE	Not named	China's dominant search engine. 30+ years of Chinese-language web index. Maps, cloud, Apollo autonomous driving data.

The pattern is striking enough to name directly. Every lab that was caught is a pure-play AI company. Every major Chinese Big Tech AI division, labs that collectively deploy more compute than the three named companies combined, is absent. This is not coincidence and it is not necessarily evidence of virtue. It is evidence of data sufficiency.

The Alibaba Data Moat: What Qwen Actually Trained On

Alibaba's data position is one of the most underappreciated structural advantages in global AI development. This is not about having "more data." It is about having the right kinds of data that frontier models demonstrably need.

Taobao and Tmall process over $1.7 trillion in gross merchandise volume annually. Every product listing, every customer review, every seller description, every search query, every purchasing decision generates structured, high-quality natural language data in commercial contexts. This is fundamentally different from scraped web text. It is purposeful, transactional, real-world language with ground-truth behavioral outcomes attached.

Alibaba Cloud serves millions of enterprise customers across China and Southeast Asia. The workloads flowing through those systems (document processing, code generation, data analysis, customer service automation) are exactly the agentic use cases that Moonshot and MiniMax felt they had to distill from Claude to acquire.^[2] Alibaba generates the same data organically through production deployments.

Ant Financial's payment and credit infrastructure handles more transactions than Visa and Mastercard combined in many quarters. The financial reasoning, risk assessment, and document understanding capabilities that represent frontier AI's most valuable enterprise applications are generated organically by Alibaba every day.

DingTalk, Alibaba's enterprise communication platform, has over 500 million registered users. Workplace documents, meeting transcripts, project management workflows, technical specifications: structured professional language at industrial scale, owned by Alibaba.

None of this data required stealing from a competitor. None of it required fraudulent API accounts or proxy networks. It required building a diversified tech company over twenty years and being smart enough to route that data asset into an AI training pipeline.

The ByteDance Parallel: SeedDance and the Same Story

This is not an Alibaba-specific phenomenon. We covered ByteDance's Seed2.0 release in detail: a full-stack AI ecosystem spanning frontier LLMs, vision, coding, and the Seedance video model that went viral.^[5]

Seed2.0 was also absent from Anthropic's distillation report. The reason is structurally identical to the Alibaba case.

ByteDance operates TikTok and Douyin, collectively the most sophisticated content recommendation systems ever built and the platform that first demonstrated that large-scale behavioral AI could outperform human editorial judgment. The training signal from a billion users selecting and rejecting content in real time is categorically different from anything available to a pure-play AI startup. It is rich, diverse, multilingual, multimodal, and continuously updated.

Toutiao, ByteDance's news aggregation platform, has been training content understanding models for a decade. Lark, their enterprise product, runs on the same infrastructure. Byte's game division, their e-commerce expansion, their music platform. All of it generates proprietary behavioral data that flows into Seed2.0 training.

ByteDance didn't need to distill from Claude. They had better data than Claude's trainers had when they built Claude.

The pattern is now two data points, which is enough to name it: Chinese Big Tech has built data moats that make distillation structurally unnecessary. Chinese AI pure-play startups have not. Anthropic's report named exactly the companies without data moats.

What The Industry Actually Looks Like From This Angle

The distillation report framed the problem as Chinese labs stealing from American labs. That framing is accurate at the level of the specific ToS violations. MiniMax, Moonshot, and DeepSeek ran fraudulent operations that violated Anthropic's terms and circumvented regional access controls.^[2]

But the framing obscures a more uncomfortable structural reality: the frontier AI race has two different competitive dynamics running simultaneously.

The first is the well-understood competition between American frontier labs. OpenAI, Anthropic, and Google are competing on research talent, compute, and proprietary training data. Their advantage is years of accumulated RLHF, Constitutional AI, and preference data from hundreds of millions of human users.

The second is less discussed: Chinese Big Tech is winning the proprietary data competition on a different axis entirely. Alibaba and ByteDance have access to commercial and behavioral data at scales and qualities that American AI labs do not. Neither Google nor Anthropic has anything like Taobao's transactional database. Neither has TikTok's behavioral ranking signal. The data moat runs in both directions.

NOTE

Why Qwen Trains Natively in 201 Languages

Qwen3.5 supports 201 languages and dialects.^[6] This is not a technical achievement in isolation. It reflects the geographic distribution of Alibaba's actual commercial operations. Lazada (Southeast Asia), Daraz (South Asia), AliExpress (global), Alibaba Cloud (Asia-Pacific). Alibaba processes real commercial transactions in dozens of languages daily. The multilingual capability is a byproduct of having genuine multilingual users, not a training objective pursued artificially.

Qwen3.5's native multimodal capabilities follow the same logic. Alibaba processes product images, merchant video, user-generated visual content, and document scans at industrial scale. Native vision-language training from pretraining rather than multimodal fine-tuning is what happens when your pretraining data is inherently multimodal because your business is.

Distillation as a Structural Response to Data Poverty

Here's the angle nobody in the distillation coverage has fully articulated: the labs Anthropic caught were not primarily trying to save money on training. They were trying to solve a structural problem that money alone cannot solve.

Training data diversity, quality, and domain coverage are the binding constraints on frontier model capability for any lab that is not Google, Meta, Alibaba, or ByteDance. Pure-play AI startups, including, to a significant degree, Anthropic itself, cannot generate the breadth of real-world behavioral data that comes from operating diversified technology products at scale.

Anthropic's own data situation is instructive. Their training pipeline relies heavily on human feedback from contractors, licensed datasets, and web-crawled text. Reddit sued Anthropic in 2025 for scraping over 100,000 posts and comments without permission to fine-tune Claude.^[7] The Stanford Alpaca project trained on 52,000 ChatGPT outputs for $500 in 2023 and was celebrated across the research community, including by researchers who later joined Anthropic and built on that work.^[8] The knowledge transfer that Anthropic is now calling an "attack" is the same process that built much of the open-source AI ecosystem those researchers came from.

This is not to excuse the fraudulent accounts or the ToS violations. Those are real legal and ethical problems. It is to say that the technique of acquiring knowledge about a model's outputs to improve your own model is industry standard. The question of where the knowledge ultimately comes from is more complicated when the companies that have it either acquired it through their own advantages (Alibaba's data moat) or built on a research commons that everybody contributed to and everybody benefited from.

The Data Moat Thesis: A Timeline

Key milestones in development

Date	Milestone	Significance
2003–2010	Alibaba builds China's e-commerce infrastructure	Taobao launches (2003), Alipay spins out (2004), Tmall launches (2008). Billions of commercial transactions in Chinese begin accumulating.
2012	ByteDance founded	Toutiao's algorithm-first content model demonstrates behavioral data's value. TikTok/Douyin follows, building the world's largest behavioral ranking dataset.
2023	Stanford Alpaca: distillation celebrated	52K ChatGPT outputs, $500, Apache 2.0. The research community celebrates distillation as AI democratization. Future frontier lab researchers build on this work.
Apr 2025	Qwen3 launches	Alibaba releases Qwen3 family trained on 36 trillion tokens across 119 languages. Trained on a decade of commercial data. No distillation campaign needed.
Feb 2026	Anthropic's distillation report	Three pure-play Chinese AI startups named. Zero Chinese Big Tech companies named. The data moat thesis is validated by absence.
Feb 16, 2026	Qwen3.5 releases	88.4 GPQA Diamond. $0.60/M input. Apache 2.0 open-weight. Built on Alibaba's proprietary data moat. One week after the distillation report.

The Open-Source Play: What Alibaba Is Actually Doing

Qwen3.5 is not a flex. It is a strategy.

Alibaba has released over 100 open-weight models under Apache 2.0 and similar licenses, which have been downloaded more than 40 million times globally.^[9] This is not altruism. It is the same playbook ByteDance used with Seed2.0's open-weight releases: flood the ecosystem with capable, permissively licensed models that developers adopt, build on, and extend. Make Qwen-based deployments the default for cost-conscious enterprise development. Build a community that generates feedback, fine-tunes, and extends your models. All of this flows back into your training pipeline.

The open-source strategy also creates a powerful competitive moat in the Asian enterprise market. Local deployments of Qwen3.5 inside Chinese corporations are not subject to Anthropic's regional access restrictions. They do not generate API data that flows to American companies. They run on Alibaba Cloud infrastructure. The open-weight release is a Trojan horse for cloud infrastructure adoption.

The Uncomfortable Symmetry

Let's state the uncomfortable truth plainly, because it doesn't fit neatly into either the American or Chinese narrative.

Anthropic named the distillation attackers correctly. MiniMax, Moonshot, and DeepSeek ran fraudulent operations. The evidence is credible, the scale was industrial, and the censorship use case DeepSeek ran against Claude crosses a qualitative line that pure capability extraction does not.^[2]

And Alibaba built a frontier AI model without doing any of that. Not because they are more ethical. Because they had better tools. The data moat is the ethical shortcut that was never framed as a shortcut because it was built through legitimate business operations.

The AI labs most vocally opposed to distillation are, without exception, labs that lack proprietary data moats and therefore depend on restricted model outputs and licensed data as their competitive training resource. Anthropic's position is strategically coherent: they are protecting the thing they need that they cannot generate at Alibaba's scale. That does not make them wrong about the specific violations. It does mean the policy position should be understood as strategic, not purely principled.

Qwen3.5 is a benchmark-setting frontier model built by a company that processes more commercial data in a day than Anthropic has in its entire history. That is the context Anthropic's distillation report did not provide. This article is providing it.

WARNING

What the Next Eighteen Months Will Show

Qwen3.5 is the first model in the Qwen3.5 series, released as a single open-weight flagship. The next models in the series (likely a Qwen3.5-Coder, Qwen3.5-Math, and Qwen3.5-VL) will extend the capability coverage. Each one will be built on the same proprietary data moat. Each one will be open-weight. And none of them will be distilling from anyone. Alibaba's competitive advantage does not require it.

Last updated: February 25, 2026 Last updated: February 26, 2026

Sources & References

Key sources and references used in this article

#	Source	Date
1	Qwen3.5: Towards Native Multimodal Agents	Feb 16, 2026
2	Detecting and preventing distillation attacks	Feb 23, 2026
3	Alibaba unveils Qwen3.5 as China's chatbot race shifts to AI agents	Feb 17, 2026
4	GPQA Diamond Leaderboard	Feb 2026
5	ByteDance Seed2.0: The Full-Stack AI Empire Behind Seedance	Feb 14, 2026
6	Qwen3.5: Features, Access, and Benchmarks	Feb 16, 2026
7	Reddit Files Lawsuit Against Anthropic Over Alleged Unauthorized Data Scraping	Jun 2025
8	Alpaca: A Strong, Replicable Instruction-Following Model	Mar 13, 2023
9	Qwen	Feb 2026
10	Anthropic Catches Three Chinese AI Labs Stealing Claude	Feb 25, 2026
11	MiniMax M2.5: Frontier AI at 20x Less Than Claude Opus	Feb 12, 2026
12	Qwen3.5: 397B MoE Benchmarks, Pricing and Complete Guide	Feb 16, 2026
13	Open Source LLM Leaderboard 2026: Rankings, Benchmarks	Feb 24, 2026
14	Claude Sonnet 4.5 Model Card	Sep 29, 2025

14 sourcesClick any row to visit original

Qwen3.5: The Model Anthropic Didn't Name

The Absent Name

What Qwen3.5 Actually Is

The Benchmarks: Where Qwen3.5 Actually Sits

Qwen3.5-397B-A17B

Qwen3.5-35B-A3B

Claude Sonnet 4.5

Claude Opus 4.6

The Missing Names in Anthropic's Report

The Alibaba Data Moat: What Qwen Actually Trained On

The ByteDance Parallel: SeedDance and the Same Story

What The Industry Actually Looks Like From This Angle

Why Qwen Trains Natively in 201 Languages

Distillation as a Structural Response to Data Poverty

The Open-Source Play: What Alibaba Is Actually Doing

The Uncomfortable Symmetry

What the Next Eighteen Months Will Show

Sources & References

More Coverage

Anthropic Catches Three Chinese AI Labs Stealing Claude

From Laughingstock to $380B: The Anthropic Comeback Story

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

Stay Updated

The Absent Name

What Qwen3.5 Actually Is

Qwen3.5 by the Numbers

The Benchmarks: Where Qwen3.5 Actually Sits

Frontier vs Efficiency: Qwen3.5 Across the Stack

Qwen3.5-397B-A17B

Qwen3.5-35B-A3B

Claude Sonnet 4.5

Claude Opus 4.6

The Missing Names in Anthropic's Report

Why Some Labs Needed to Distill and Others Didn't

The Alibaba Data Moat: What Qwen Actually Trained On

The ByteDance Parallel: SeedDance and the Same Story

What The Industry Actually Looks Like From This Angle

Why Qwen Trains Natively in 201 Languages

Distillation as a Structural Response to Data Poverty

The Data Moat Thesis: A Timeline

The Open-Source Play: What Alibaba Is Actually Doing

Who Qwen3.5 Actually Disrupts

Anthropic's API Business

Pure-Play Chinese AI Startups

American AI Policy Debate

Open-Source AI Community

The Uncomfortable Symmetry

What the Next Eighteen Months Will Show

What to Actually Take Away From Qwen3.5

Sources & References

More Coverage

Anthropic Catches Three Chinese AI Labs Stealing Claude

From Laughingstock to $380B: The Anthropic Comeback Story

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

Stay Updated