Anthropic Catches Three Chinese AI Labs Stealing Claude

TL;DR: On February 23, 2026, Anthropic publicly named DeepSeek, Moonshot AI, and MiniMax for running coordinated industrial-scale campaigns to extract Claude's capabilities through fraudulent API accounts — over 16 million exchanges across 24,000 fake accounts.^[1] The evidence is real, the ToS violations are clear, and the censorship angle is genuinely alarming. But Anthropic framing distillation as an "attack" when the entire industry, including Anthropic itself, was built on the same technique is a strategic positioning move dressed up as a moral argument. Both things can be true at once.

The DeepSeek R1 moment hit like a thunderclap. January 2025: a Chinese lab releases a model that matches GPT-4 at a fraction of the training cost. The narrative wrote itself. Scrappy Chinese engineers had out-innovated Silicon Valley. Export controls were useless. American AI supremacy was already over.

That story was always too convenient. On February 23, 2026, Anthropic published evidence that punctures it.^[1] What looked like independent innovation was, at least in part, systematic capability extraction from the very models it supposedly surpassed.

But here's what nobody wants to say out loud: distillation is how the AI industry built itself. Stanford used ChatGPT outputs to train Alpaca. The open-source AI movement runs on it. DeepSeek openly releases its own distilled models with MIT licenses and encourages others to distill them further. The Chinese labs broke Anthropic's Terms of Service, used fraudulent accounts, and circumvented regional access restrictions. That part is clearly wrong. But Anthropic calling the technique itself an "attack" is a company protecting its moat, not protecting the field.

The real story isn't that distillation happened. It's what specifically was extracted, how it was done, and why one use case, generating censorship infrastructure for an authoritarian government, crosses a line that scale and ToS violations alone don't capture.

BREAKING

What Anthropic Actually Found

Anthropic directly named DeepSeek, Moonshot AI (Kimi), and MiniMax for coordinated distillation campaigns. MiniMax drove the most traffic with over 13 million exchanges. Moonshot accounted for 3.4 million. DeepSeek ran 150,000 targeted extractions focused on chain-of-thought reasoning data. Combined: over 16 million exchanges across 24,000 fraudulent accounts, all in violation of Anthropic's Terms of Service and regional access restrictions that bar Claude's use in China.^[2]

Developing story

The Mechanics: What Distillation Actually Is and Why Everyone Does It

Before calling anything an attack, understand what distillation actually is, because every major AI lab in existence has done it or benefits from research built on it.

The concept is simple. Train a smaller model on the outputs of a larger one. The student learns to mimic the teacher's behavior. Knowledge transfers through examples. Anthropic does this constantly with their own models. It's how Haiku exists. It's how you build specialized versions without spending $100 million on compute.^[1]

But the same technique has been the foundation of the open-source AI movement for years. In March 2023, Stanford researchers published Alpaca: a 7B model trained on 52,000 instructions generated from OpenAI's text-davinci-003.^[3] Cost: less than $500 in API calls. It was celebrated as a landmark achievement in democratizing AI. Nobody called it an attack. The AI community threw a party for it.

Vicuna, WizardLM, Orca, dozens of other open models that shaped the field — all built on distillation from GPT outputs. The technique is not just normalized; it is the mechanism through which AI capability diffused beyond the walls of the big labs and into research communities, startups, and universities worldwide.^[4]

And then there's DeepSeek itself. When DeepSeek released R1, they didn't just make the weights available. They released six distilled versions with names like "DeepSeek-R1-Distill-Qwen-7B" and "DeepSeek-R1-Distill-Llama-70B." Their MIT license explicitly states the models allow "any modifications and derivative works, including, but not limited to, distillation for training other LLMs."^[5] DeepSeek put "distill" in the model names. They built their entire release strategy around enabling exactly what Anthropic is now calling an attack when turned back on them.

The uncomfortable truth about the AI industry is that its entire trajectory runs through distillation. The frontier labs used the internet's collective human output, scraped without consent, to train their base models. Reddit sued Anthropic in June 2025 for scraping over 100,000 posts and comments to fine-tune Claude.^[6] Open-source labs distilled the frontier labs. The frontier labs distilled each other through benchmark contamination, shared research, and employees who carry knowledge between companies. The industry that built itself by absorbing everyone else's work does not have clean hands when it comes to knowledge extraction.

None of this makes what DeepSeek, Moonshot, and MiniMax did acceptable. The ToS violations were deliberate. The fraudulent accounts were industrial in scale. And the censorship use case is genuinely alarming in a way that pure competitive copying is not. But the framing matters. "Distillation attack" implies the technique is the crime. The crime is what was done with the technique, at what scale, through what means, and for what purpose.

The Three Campaigns: What Was Actually Taken

MiniMax: 13 Million Exchanges, Caught in the Act

MiniMax ran the largest operation by a substantial margin: over 13 million exchanges, targeting agentic coding, tool use, and orchestration.^[2] The scale alone separates this from research-scale distillation. Nobody distills 13 million times to learn how a technique works.

The detail that makes MiniMax's campaign particularly notable: Anthropic caught it while it was still active, before MiniMax had released the model it was training.^[7] This gave Anthropic an unusual window into the full lifecycle of an industrial distillation campaign. When Anthropic released a new Claude model during the active campaign, MiniMax pivoted within 24 hours, redirecting nearly half its traffic to capture capabilities from the latest system. The responsiveness implies an active engineering team monitoring the campaign in real time, not an automated script running unattended.

Moonshot AI: Senior Staff Caught on Metadata

Moonshot's campaign touched 3.4 million exchanges across agentic reasoning, tool use, coding, data analysis, computer-use agents, and computer vision.^[1] The breadth suggests systematic capability mapping rather than targeted capability extraction. They weren't going deep on one thing. They were surveying Claude's full practical capability surface.

The attribution was particularly direct for Moonshot. Anthropic identified the campaign through request metadata that matched the public profiles of senior Moonshot staff.^[1] This isn't anonymous activity at the margins of the organization. The people running this matched the senior technical leadership. In a later phase, Moonshot attempted to extract and reconstruct Claude's reasoning traces specifically, moving from output collection toward architectural reverse engineering.

DeepSeek: Chain-of-Thought and Censorship Infrastructure

DeepSeek's 150,000 exchanges were the smallest in volume and the most targeted in design. Two specific techniques set DeepSeek's campaign apart from the others.

The first: explicit chain-of-thought extraction. Prompts asked Claude to articulate the internal reasoning behind completed responses, step by step, at scale.^[1] This is precisely the training data that powers reasoning models. You can't train a model to reason by showing it answers. You train it by showing it thinking. DeepSeek extracted the thinking.

The second: using Claude to generate censorship-safe alternatives to politically sensitive queries. Questions about dissidents, party leaders, authoritarianism. Claude would answer; DeepSeek would collect the response; their models would be trained to steer away from those topics.^[1] This is not competitive copying. This is using an American safety-focused AI model as infrastructure for building a Chinese censorship system. The scale was small relative to MiniMax. The intent was qualitatively different from anything that fits under "distillation as normal industry practice."

The Infrastructure: Professional-Grade Evasion

The method of access tells you this is not opportunistic. These labs used commercial proxy services running what Anthropic calls "hydra cluster" architectures: sprawling networks of fraudulent accounts distributed across Anthropic's API and third-party cloud platforms, mixing distillation traffic with unrelated customer requests to camouflage the operation.^[1]

When one account is banned, a replacement appears automatically. In one documented case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously. The infrastructure implies dedicated engineering resources, institutional budget, and deliberate organizational commitment. This is not a team of engineers running experiments. This is a program.

NOTE

Why Agentic Capabilities Were the Target

All three campaigns specifically targeted Claude's most differentiated capabilities: agentic reasoning, tool use, and coding. These aren't benchmark scores. They're the capabilities that make AI useful for real enterprise work. Training a model to answer questions is tractable. Training it to orchestrate multi-step workflows, use external tools, and maintain coherence across long autonomous tasks is genuinely hard. That's what was being extracted.^[1]

The Distillation Double Standard the Industry Won't Acknowledge

Here's where the honest accounting gets uncomfortable, and where Anthropic's framing starts to serve their interests more than the truth.

Anthropic's announcement post on X received 53,844 likes and 31 million views in 48 hours.^[13] The majority of the engagement was not celebration. The CNBC article covering the announcement noted that analysts said "nuance is needed to distinguish between the different narratives, as the boundary between illicit and legitimate practice is often blurry."^[8] Tory Green, co-founder of AI infrastructure firm IO.Net, put it more directly: "You trained on the open internet and then call it 'distillation attacks' when others learn from you. Labs that like to preach 'open research' suddenly crying about open access."^[12] Another prominent AI researcher wrote: "Ohhh nooo not my private IP, how dare someone use that to train an AI model, only Anthropic has the right to use everyone else's IP."^[12]

The argument that Anthropic and other US labs make is that their case is different because: it violates Terms of Service, it circumvents regional restrictions, it occurs at industrial scale, and it strips safety guardrails. These are real distinctions. But the first three describe execution, not the technique. The Terms of Service point is legally relevant. The "strip safeguards" argument is the only genuinely principled objection that doesn't apply to distillation generally.

Distillation: When It Was Celebrated vs. When It's an Attack

Feature	Celebrated	Attack
Stanford Alpaca (2023)	52K GPT outputs, openly published, celebrated as AI democratization. $500 API spend. No ToS analysis done.
Vicuna, WizardLM, Orca	Built on GPT/Claude outputs, formed the backbone of open-source AI. Industry praised them.
DeepSeek distilled models	R1-Distill-Qwen, R1-Distill-Llama — MIT licensed, explicitly encourages further distillation, 'distill' is in the model name.
MiniMax (13M exchanges)		ToS violation, fraudulent accounts, circumvented regional restrictions, institutional scale. Caught mid-operation.
Moonshot (3.4M exchanges)		Hundreds of fraudulent accounts, senior staff directly involved, reasoning trace reconstruction attempt.
DeepSeek (150K + censorship)		Chain-of-thought extraction plus using Claude to build censorship-safe alternatives to queries about dissidents and authoritarianism.

The double standard matters because it shapes how policy responds. If the industry frames "distillation from competitors" as inherently an attack, it sets a precedent that would criminalize the research practices that made open-source AI possible. If it frames the actual violations correctly, which is ToS fraud at industrial scale plus specific censorship use that is qualitatively different from anything in the celebration column, the policy response can be proportionate.

What's often overlooked is that DeepSeek's openly released distilled models have been downloaded and used by researchers at labs including American ones. The MIT license permits it. Anthropic's researchers, like everyone else in the field, have used DeepSeek outputs in their work. The directional flow of distillation is not unidirectional.

The Safety Problem That Isn't Getting Enough Attention

The competitive angle gets the coverage. The safety argument deserves more of it, with a distinction.

Anthropic's safety claim has two parts. The first: distilled models don't inherit safety safeguards. The second: this creates national security risks when those models are fed into military, surveillance, and intelligence systems by authoritarian governments.^[1]

The first part is the principled objection that applies regardless of who's doing the distilling. This is also Anthropic's strongest argument and the one that isn't contaminated by the double standard problem. Safety alignment is not a property of outputs. It's embedded in training, in the RLHF process, in the Constitutional AI framework, in years of work that goes into shaping how a model responds to sensitive requests. A model trained on Claude's outputs, at any scale, by anyone, does not inherit that work. It gets the answers without the values.

WARNING

The Proliferation Risk

When distilled models are open-sourced, Anthropic writes, the risk "multiplies as these capabilities spread freely beyond any single government's control."^[1] A model with frontier coding or reasoning capabilities and no meaningful resistance to requests for offensive cyberattack assistance or bioweapon development guidance, released globally, is precisely the uncontrolled proliferation that makes AI safety researchers lose sleep. The safety work does not transfer. The capabilities do. This part of Anthropic's argument is correct regardless of how one feels about the competitive framing.

The second part, the national security angle, is where the context changes. China's military-civil fusion doctrine means civilian AI development and defense applications are not cleanly separated.^[9] A model with distilled Claude capabilities integrated into military intelligence systems is a national security concern that doesn't exist when Stanford trains Alpaca. The destination matters. And DeepSeek's specific use of Claude to build censorship infrastructure demonstrates that the line between competitive distillation and politically weaponized distillation is not theoretical.

The Export Controls Argument Gets Stronger

Before Anthropic's report, the "export controls don't work" argument had real momentum. DeepSeek R1 matched GPT-4 at what appeared to be a fraction of the cost. The narrative: restricting chip exports is pointless if Chinese engineers can innovate around the hardware constraints.

Anthropic's evidence complicates that picture in a specific way.^[1] Running 24,000 fraudulent accounts and processing 16 million exchanges, then training on the resulting dataset, requires serious compute. Distillation at industrial scale is not a low-compute operation. And if the capability improvements that made those models look competitive with US frontier models depended "in significant part on capabilities extracted from American models," then those improvements are not evidence that export controls failed. They are evidence of a workaround that itself requires the compute that export controls restrict.

Reuters reported on the same day as Anthropic's announcement that the US had found evidence DeepSeek had trained its AI model on Nvidia's Blackwell chip, apparently flouting export controls, according to anonymous senior officials.^[10] That story and the distillation report together form a picture: the Chinese labs were not just creatively engineering around hardware limitations. They were circumventing chip controls and extracting from American models simultaneously. Both paths.

How Anthropic Caught Them

The detection methodology is sophisticated enough to be both reassuring and concerning. Reassuring because they caught it. Concerning because it took this long and required cross-platform corroboration with industry partners.

The attribution relied on converging signals: IP address correlation, request metadata analysis, infrastructure indicators, and confirmation from other AI labs that observed the same actors on their platforms.^[1] The cross-platform corroboration is significant. These campaigns weren't exclusively targeting Claude. Multiple frontier model providers were being hit simultaneously.

The behavioral signatures of a distillation campaign are distinct from legitimate use: massive volume concentrated in narrow capability areas, highly repetitive prompt structures, and content that maps directly onto what is most valuable for training data. A single prompt asking for expert data analysis with complete transparent reasoning is unremarkable. The same prompt arriving tens of thousands of times across hundreds of coordinated accounts is not.

Timeline: From Campaigns to Public Disclosure

Key milestones in development

Date	Milestone	Significance
2024–2025	Campaigns begin	DeepSeek, Moonshot, and MiniMax begin distillation campaigns using commercial proxy services and hydra cluster architectures.
Jan 2025	DeepSeek R1 launches	DeepSeek releases R1, matching GPT-4 performance at apparently low cost. The 'export controls don't work' narrative gains traction globally.
Early 2026	MiniMax caught mid-operation	Anthropic detects MiniMax's campaign while still active, observing MiniMax pivot within 24 hours when a new Claude model is released.
Feb 12, 2026	OpenAI's memo to Congress	OpenAI submits a memo to the House Select Committee on China accusing DeepSeek of ongoing distillation from US frontier models using obfuscated methods.
Feb 23, 2026	Anthropic goes public	Anthropic publishes its full report, directly naming DeepSeek, Moonshot, and MiniMax. First major public attribution of AI model theft at scale.
Feb 24, 2026	Industry and critics respond	Reuters reports DeepSeek trained on banned Nvidia chips. Critics on X call out the distillation double standard. CNBC analysts note the blurry boundary between legitimate and illicit practice.

Anthropic is investing in four response categories: detection classifiers and behavioral fingerprinting, intelligence sharing with other labs and authorities, strengthened access controls for exploited onboarding pathways, and countermeasures at the product, API, and model level designed to degrade distillation value without affecting legitimate users.^[1]

The Decision to Go Public: What Anthropic Actually Wants

The most consequential thing about this report is not the findings. It's the decision to name names publicly, and what that decision is designed to produce.

Catching attackers and quietly banning accounts is the easy path. It avoids diplomatic friction and legal exposure. Most companies take it. Anthropic made a different calculation, and it's worth being clear-eyed about why.

Anthropic has been one of the most consistent voices in support of AI export controls.^[11] The distillation report serves their policy agenda. Naming specific Chinese labs, framing distillation as a national security threat, and calling for coordinated industry and regulatory response is how you get the policy outcomes you want. The evidence supports the narrative they want to tell. That doesn't mean the narrative is wrong. It means it's also useful.

The argument that safety-focused AI development is strategically important, not just morally important, has been Anthropic's core pitch since 2021. This report makes that argument with evidence. Whether the policymakers who matter are paying attention is a different question.

What This Actually Means for the DeepSeek Narrative

The question is not whether DeepSeek distilled American models. They did, and the evidence is credible. The question is what fraction of their capability improvements came from extraction versus independent engineering.

DeepSeek's chain-of-thought training data extraction targeted exactly the data that powers reasoning models like R1. If that data was partially sourced from Claude's reasoning traces, then R1's reasoning capabilities have a real dependency on Anthropic's research investment. That's not nothing. It's also not the whole story.

DeepSeek's R1 technical paper demonstrates genuine research contributions. Their reinforcement learning approach, applying RL directly to the base model without supervised fine-tuning as a preliminary step, produced unexpected emergent reasoning behaviors that were not simply copied from anyone.^[5] The 150,000 Claude extractions are a meaningful input. They are not a sufficient explanation for a model that matched GPT-4 on reasoning benchmarks.

What's often overlooked is that "genuine innovation" and "also extracted from competitors" are not mutually exclusive. The question is degree and attribution, not binary guilt. And the "they did it cheaper because they copied" narrative, which American industry and media has been eager to adopt, overstates what the distillation evidence shows.

The Internet's Verdict: Two Completely Different Stories

The X discourse around Anthropic's announcement fractured almost immediately into parallel narratives that never engaged with each other.

In English-language tech circles, the dominant reaction was the hypocrisy argument. The 7,767 quote-tweets of Anthropic's announcement were heavily critical.^[13] The pattern was consistent: acknowledge the ToS violation, reject the framing. The Stanford Alpaca comparison appeared in hundreds of threads. Researchers pointed out that Vicuna and WizardLM, two of the most influential open-source models of 2023, were built entirely on distilled GPT outputs and everyone celebrated them.

In China-aligned and Chinese-language circles, a different narrative took hold. @JundeMorsenWu, an Oxford PhD and former Baidu/SenseTime researcher, posted a thread with 2,866 likes and 91,000 views arguing that Anthropic was "obviously bullying" DeepSeek over chain-of-thought reasoning, claiming the accusation was a geopolitical move dressed as a principled stance.^[14]

The most widely shared meme, reaching 84,000 views in the first 24 hours,^[15] was a fake screenshot of Claude Sonnet responding to a prompt with "我是 DeepSeek" (I am DeepSeek). The joke lands on multiple levels: DeepSeek distilled Claude's thinking, but Claude might also think like DeepSeek now given how much DeepSeek's open-weight models have been absorbed into the broader research ecosystem. The directional accusation (DeepSeek stole from Claude) invites the obvious inverse question.

Here's the part nobody has fully reckoned with: there is zero evidence that Anthropic distilled DeepSeek specifically, and the traffic and fraud run clearly in one direction in Anthropic's documented case.^[1] But the meme's logic isn't wrong about the broader ecosystem. DeepSeek's openly released models, with their explicit MIT distillation licenses, have been downloaded and used by researchers at American labs as part of normal research practice. The directional accusations are real. The clean moral lines are not.

What's most telling is the @HealthRanger thread, which asked sincerely whether Claude might have been trained on DeepSeek.^[15] The answer is: not in the way DeepSeek distilled Claude. But the fact that the question reads as plausible to a large audience tells you something about how normalized cross-lab knowledge transfer has become. In a field where everyone uses everyone else's open weights, public benchmarks, and shared research, the idea that one direction of transfer is legitimate and the other is an industrial-scale attack requires more than a Terms of Service argument to land with the credibility Anthropic wants.

The Bigger Picture

The AI industry just had its first major public reckoning with model theft. The evidence is real. The violations were clear. And there is a genuine national security dimension that goes beyond competitive positioning.

WARNING

The Threat Is Ongoing

Anthropic was explicit: these campaigns are "growing in intensity and sophistication."^[1] The three labs named are not the only actors. The report establishes that industrial-scale distillation with institutional backing is happening now. The response infrastructure, both technical and policy, is being built in real time. The window to set the right precedent, one that distinguishes between legitimate knowledge transfer and fraudulent industrial extraction, is open now and will not stay open.

Last updated: February 25, 2026

Sources & References

Key sources and references used in this article

#	Source	Date
1	Detecting and preventing distillation attacks	Feb 23, 2026
2	Anthropic joins OpenAI in flagging 'industrial-scale' distillation campaigns by Chinese AI firms	Feb 24, 2026
3	Alpaca: A Strong, Replicable Instruction-Following Model	Mar 13, 2023
4	Survey on Knowledge Distillation for Large Language Models	Nov 2025
5	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning	Jan 2025
6	Reddit Files Lawsuit Against Anthropic Over Alleged Unauthorized Data Scraping	Jun 2025
7	Anthropic accuses DeepSeek, other Chinese AI developers of 'industrial-scale' copying	Feb 23, 2026
8	Anthropic's distilling charges against Chinese firms expose AI training grey area	Feb 24, 2026
9	Military-Civil Fusion and the People's Republic of China	2024
10	OpenAI says China's DeepSeek trained its AI by distilling US models	Feb 13, 2026
11	Securing America's Compute Advantage: Anthropic's Position on the Diffusion Rule	2025
12	Critics Mock Anthropic's Claims Chinese AI Labs Are Stealing Its Data	Feb 23, 2026
13	Anthropic X announcement — 31M views, 53,844 likes	Feb 23, 2026
14	@JundeMorsenWu — 'Anthropic is bullying DeepSeek' thread	Feb 24, 2026
15	Viral meme: Claude responding '我是 DeepSeek' — 84K views	Feb 24, 2026

15 sourcesClick any row to visit original

Anthropic Catches Three Chinese AI Labs Stealing Claude

What Anthropic Actually Found

The Mechanics: What Distillation Actually Is and Why Everyone Does It

The Three Campaigns: What Was Actually Taken

MiniMax: 13 Million Exchanges, Caught in the Act

Moonshot AI: Senior Staff Caught on Metadata

DeepSeek: Chain-of-Thought and Censorship Infrastructure

The Infrastructure: Professional-Grade Evasion

Why Agentic Capabilities Were the Target

The Distillation Double Standard the Industry Won't Acknowledge

The Safety Problem That Isn't Getting Enough Attention

The Proliferation Risk

The Export Controls Argument Gets Stronger

How Anthropic Caught Them

The Decision to Go Public: What Anthropic Actually Wants

What This Actually Means for the DeepSeek Narrative

The Internet's Verdict: Two Completely Different Stories

The Bigger Picture

The Threat Is Ongoing

Sources & References

More Coverage

From Laughingstock to $380B: The Anthropic Comeback Story

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

OpenClaw: Weekend Project to OpenAI Acquihire in 90 Days

Stay Updated

What Anthropic Actually Found

The Mechanics: What Distillation Actually Is and Why Everyone Does It

The Three Campaigns: What Was Actually Taken

MiniMax: 13 Million Exchanges, Caught in the Act

Moonshot AI: Senior Staff Caught on Metadata

DeepSeek: Chain-of-Thought and Censorship Infrastructure

By The Numbers

The Infrastructure: Professional-Grade Evasion

Why Agentic Capabilities Were the Target

The Distillation Double Standard the Industry Won't Acknowledge

Distillation: When It Was Celebrated vs. When It's an Attack

The Safety Problem That Isn't Getting Enough Attention

The Proliferation Risk

The Export Controls Argument Gets Stronger

How Anthropic Caught Them

Timeline: From Campaigns to Public Disclosure

The Decision to Go Public: What Anthropic Actually Wants

Who Gets Affected by This Report

DeepSeek, Moonshot, MiniMax

US Policymakers

Open-Source AI Community

Commercial Proxy Services

All AI Labs (including US)

What This Actually Means for the DeepSeek Narrative

What to Actually Take Away From This

The Internet's Verdict: Two Completely Different Stories

The Bigger Picture

The Threat Is Ongoing

Sources & References

More Coverage

From Laughingstock to $380B: The Anthropic Comeback Story

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

OpenClaw: Weekend Project to OpenAI Acquihire in 90 Days

Stay Updated