Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

TL;DR: Claude Sonnet 4.6, released February 17, 2026, is now the default model on claude.ai and Claude Cowork - and it just made Opus-class performance available at Sonnet pricing. Users in early access preferred Sonnet 4.6 over Sonnet 4.5 70% of the time and over Opus 4.5 59% of the time^[1]. It ships with a 1M token context window in beta, leads all Sonnet models on OSWorld computer use, and developed a novel business strategy on Vending-Bench Arena that no prior model had tried. Pricing stays flat at $3/$15 per million tokens. This is the biggest gap between capability and price that Anthropic has ever released.

There's a story Anthropic has been telling for two years: that intelligence should cascade down the model family, not stay locked at the frontier tier. Every Sonnet release is a test of whether that story is true. Claude Sonnet 4.6 is the first time it feels genuinely proven.

Twelve days ago, Opus 4.6 launched as the most capable model in Anthropic's history - the model that dethroned incumbents on agentic benchmarks, rewrote enterprise market share numbers, and commanded $5/$25 per million tokens as the price of admission. Today, Sonnet 4.6 does a significant portion of what Opus does, at 60% less cost, and it's now free for every user on every plan.

That's not a product update. That's a statement about where the frontier actually lives.

BREAKING

Why This Matters Now

Sonnet 4.6 is now the default model across all Claude plans, Claude Cowork, Claude Code, and the API - including the free tier. Anthropic upgraded free users to Sonnet 4.6 and added file creation, connectors, skills, and compaction to that tier simultaneously^[1]. The gap between "free Claude" and "Claude that matters" just got a lot smaller.

Developing story

The Preference Numbers That Should Worry Anthropic's Pricing Team

Anthropic ran head-to-head preference evaluations in Claude Code - one of the most demanding real-world environments for a model, where errors compound over long sessions and instruction following is tested repeatedly across a single context.

The results are stark.

A 59% preference rate over Opus 4.5 means that for a majority of coding tasks, users actively chose the cheaper model when given both options blind. This isn't just a benchmark win. It's users voting with their attention on real work.

The specific complaints about Sonnet 4.5 that Sonnet 4.6 addresses are worth reading closely: overengineering, laziness, false claims of task completion, hallucinations in long sessions, poor instruction following on multi-step tasks^[1]. These aren't edge case failures. They're the core failure modes of every LLM that gets deployed in production. Sonnet 4.6 apparently fixed most of them. At Sonnet pricing.

Computer Use: From Experimental to Actually Useful

In October 2024, Anthropic launched computer use and called it "still experimental - at times cumbersome and error-prone." That was honest. It was also a 16-month countdown to what Sonnet 4.6 delivers today.

The OSWorld benchmark tests models on real software - Chrome, LibreOffice, VS Code, and more - running on a simulated computer. No special APIs. No purpose-built connectors. The model sees a screen and interacts with it the way a person would: clicking a mouse, typing on a keyboard.

Sonnet Computer Use Progress (OSWorld)

Key milestones in development

Date	Milestone	Significance
Oct 2024	Computer use launched	Called experimental by Anthropic; limited real-world utility
Early 2025	Sonnet 4.x improvements	Steady gains across successive releases; benchmark scores climbing
Jul 2025	OSWorld-Verified released	Upgraded benchmark with improved task quality, grading, and infrastructure
Nov 2025	Sonnet 4.5 baseline	Measurable improvement; prompt injection resistance still a concern
Feb 2026	Sonnet 4.6 launches	Human-level on real tasks; prompt injection resistance now matches Opus 4.6

The practical milestone isn't the benchmark number - it's the report from early users. They're seeing human-level capability on tasks like navigating complex spreadsheets and filling out multi-step web forms across multiple browser tabs^[1]. That's the inflection point. Not "better than before." Human-level on specific, economically valuable categories of tasks.

The prompt injection improvement is equally significant. Malicious content embedded in webpages - the core attack vector for any computer-using AI - is now handled at Opus 4.6-level resistance. A model that can use computers but can be hijacked by any website it visits is not a deployable product. Sonnet 4.6 closes that gap.

The Vending-Bench Strategy That No Model Had Tried Before

Vending-Bench Arena is a long-horizon planning benchmark that puts models in charge of a simulated business over time, with direct competition between AI models measured by profitability. It's the closest thing to a real-world test of strategic reasoning that exists in the benchmark ecosystem.

Sonnet 4.6 didn't just win. It developed a strategy that no prior model had used.

Most models optimize for short-term profit from the start - a reasonable heuristic when you don't know how long the game runs. Sonnet 4.6 took a different approach: it invested aggressively in capacity for the first ten simulated months, spending significantly more than its competitors, absorbing a profitability deficit - and then pivoted sharply to maximize returns in the final stretch^[1].

The timing of that pivot was what won it. Not just the strategy, but knowing when to switch.

This matters beyond the benchmark. It suggests the 1M context window isn't just storage - Sonnet 4.6 appears to use long context to reason more effectively about sequences of decisions over time. That's the behavior enterprises actually need from agentic models: not just completing one step well, but managing a multi-phase plan coherently across an entire session.

Benchmarks: Where Sonnet 4.6 Sits in the Current Landscape

Feature	Sonnet 4.6	Sonnet 4.5
OSWorld (Computer Use)	~72%	~58%
Vending-Bench Arena	Wins	Baseline
User preference vs prior gen	70%	62%
Context window	1M tokens (beta)	200K tokens
Pricing (in/out per M tokens)	$3 / $15	$3 / $15

Feature	Sonnet 4.6	Opus 4.6
Price (input / output per M tokens)	$3 / $15	$5 / $25
Context window	1M tokens (beta)	1M tokens
Computer use (OSWorld)	~72%	72.7%
Prompt injection resistance	Opus-level	Best in class
Claude Code preference vs Opus 4.5	59%	~80%
Recommended for	Most tasks	Deepest reasoning, multi-agent
Default on claude.ai	Yes	No
Extended thinking	Yes	Yes
Adaptive thinking	Yes	Yes

The 1M Context Window Is Bigger Than It Sounds

Every recent frontier model advertises a large context window. The number that actually matters is retrieval accuracy within that context - whether the model can actually find and use information buried deep in a long document.

Opus 4.6 showed that Anthropic can build a model that scores 76% on MRCR v2 at 1M tokens, while a competitor's 2M-token model scored 26.3% on the same test. Sonnet 4.6 brings the same 1M context window to a model that costs 40% less.

What does 1M tokens actually hold? Anthropic puts it plainly: entire codebases, lengthy contracts, or dozens of research papers in a single request^[1]. For engineering teams, that means asking questions about a full repository without chunking. For legal and financial teams, that means feeding an entire contract alongside precedents without losing context between documents.

The Vending-Bench result suggests this isn't just theoretical - the long context appears to enable qualitatively different reasoning about long-horizon tasks, not just longer storage of facts.

What's New on the Platform

Sonnet 4.6 ships with a full set of platform updates that extend beyond the model itself:

Claude Developer Platform:

Extended thinking and adaptive thinking both supported
Context compaction in beta: automatically summarizes older context as conversations approach limits - effective context length extends beyond 1M for long-running sessions

API tools (now GA):

Code execution
Memory
Programmatic tool calling
Tool search
Tool use examples
Web search and fetch now auto-write and execute code to filter and process results - keeping only relevant content in context, improving both response quality and token efficiency

Claude in Excel:

Now supports MCP connectors: S&P Global, LSEG, Daloopa, PitchBook, Moody's, FactSet
If you've set up MCP connectors in Claude.ai, they work automatically in Excel
Available on Pro, Max, Team, and Enterprise plans

Where Opus 4.6 Still Wins

Anthropic is explicit about this, which is refreshing. Sonnet 4.6 is not a replacement for Opus 4.6 across the board. The recommendation holds for specific task categories:

Deepest reasoning - problems where getting it exactly right is paramount, not just approximately right
Codebase refactoring - large, interconnected changes where a single error invalidates a full session's work
Coordinating multiple agents - Opus 4.6's Agent Teams capability, where multiple Claude instances collaborate autonomously, remains in a separate tier
Highest-stakes decisions - any task where the cost of failure exceeds the cost of the premium model

For everything else - the large middle of the enterprise AI workload - Sonnet 4.6 is now the answer.

The Broader Shift This Represents

Two years ago, the AI model stack worked like enterprise software: expensive frontier at the top, a capability cliff below it, and a huge premium for the best version. Anthropic has spent those two years systematically collapsing that cliff.

Sonnet 4.6 is the clearest evidence yet that it's working. A 59% preference rate over a model that was Anthropic's frontier just three months ago isn't incremental progress. It's what happens when the learnings from building Opus filter down into the next tier faster than any prior model generation cycle has managed.

For most teams that defaulted to Opus for serious work: you now have to justify that choice again. The old heuristic - "use Opus when it matters, Sonnet when it doesn't" - no longer maps cleanly to the capability gap between them.

That's a genuinely unusual position for Anthropic's own pricing to be in. And it's almost certainly intentional.

Sources & References

Key sources and references used in this article

#	Source	Date
1	Introducing Claude Sonnet 4.6	Feb 17, 2026
2	Claude Sonnet 4.6 System Card	Feb 17, 2026
3	OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments	2024
4	Claude API Pricing	Feb 2026

4 sourcesClick any row to visit original

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

Why This Matters Now

The Preference Numbers That Should Worry Anthropic's Pricing Team

Computer Use: From Experimental to Actually Useful

The Vending-Bench Strategy That No Model Had Tried Before

Benchmarks: Where Sonnet 4.6 Sits in the Current Landscape

The 1M Context Window Is Bigger Than It Sounds

What's New on the Platform

Where Opus 4.6 Still Wins

The Broader Shift This Represents

Sources & References

More Coverage

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

OpenClaw: Weekend Project to OpenAI Acquihire in 90 Days

ByteDance Seed2.0: The Full-Stack AI Empire Behind Seedance

MiniMax M2.5: Frontier AI at 20x Less Than Claude Opus

Stay Updated

Why This Matters Now

The Preference Numbers That Should Worry Anthropic's Pricing Team

By The Numbers

Computer Use: From Experimental to Actually Useful

Sonnet Computer Use Progress (OSWorld)

The Vending-Bench Strategy That No Model Had Tried Before

Benchmarks: Where Sonnet 4.6 Sits in the Current Landscape

The 1M Context Window Is Bigger Than It Sounds

What's New on the Platform

Where Opus 4.6 Still Wins

The Broader Shift This Represents

Key Takeaways

Sources & References

More Coverage

Gemini 3.1 Pro: Google Reclaims the AI Benchmark Crown

OpenClaw: Weekend Project to OpenAI Acquihire in 90 Days

ByteDance Seed2.0: The Full-Stack AI Empire Behind Seedance

MiniMax M2.5: Frontier AI at 20x Less Than Claude Opus

Stay Updated