Back to News
AI Companies

Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

LLM Rumors··13 min read·
...
Claude Sonnet 4.6AnthropicComputer UseAI CodingClaude CodeContext WindowAgentic AI
Claude Sonnet 4.6: Opus-Level Intelligence at Sonnet Price

TL;DR: Claude Sonnet 4.6, released February 17, 2026, is now the default model on claude.ai and Claude Cowork - and it just made Opus-class performance available at Sonnet pricing. Users in early access preferred Sonnet 4.6 over Sonnet 4.5 70% of the time and over Opus 4.5 59% of the time[1]. It ships with a 1M token context window in beta, leads all Sonnet models on OSWorld computer use, and developed a novel business strategy on Vending-Bench Arena that no prior model had tried. Pricing stays flat at $3/$15 per million tokens. This is the biggest gap between capability and price that Anthropic has ever released.

There's a story Anthropic has been telling for two years: that intelligence should cascade down the model family, not stay locked at the frontier tier. Every Sonnet release is a test of whether that story is true. Claude Sonnet 4.6 is the first time it feels genuinely proven.

Twelve days ago, Opus 4.6 launched as the most capable model in Anthropic's history - the model that dethroned incumbents on agentic benchmarks, rewrote enterprise market share numbers, and commanded $5/$25 per million tokens as the price of admission. Today, Sonnet 4.6 does a significant portion of what Opus does, at 60% less cost, and it's now free for every user on every plan.

That's not a product update. That's a statement about where the frontier actually lives.

BREAKING

Why This Matters Now

Sonnet 4.6 is now the default model across all Claude plans, Claude Cowork, Claude Code, and the API - including the free tier. Anthropic upgraded free users to Sonnet 4.6 and added file creation, connectors, skills, and compaction to that tier simultaneously[1]. The gap between "free Claude" and "Claude that matters" just got a lot smaller.

Developing story

The Preference Numbers That Should Worry Anthropic's Pricing Team

Anthropic ran head-to-head preference evaluations in Claude Code - one of the most demanding real-world environments for a model, where errors compound over long sessions and instruction following is tested repeatedly across a single context.

The results are stark.

By The Numbers

70%
Preferred over Sonnet 4.5

User preference rate in Claude Code evaluations

59%
Preferred over Opus 4.5

Users chose Sonnet 4.6 over the previous frontier model

1M tokens
Context Window

Beta; equivalent to entire codebases or dozens of research papers

$3/$15
Pricing

Per million tokens in/out - unchanged from Sonnet 4.5

16 months
OSWorld improvement

Continuous benchmark gains since first computer use release in Oct 2024

Opus-level
Prompt injection resistance

Major improvement vs Sonnet 4.5; now matches Opus 4.6 on safety evals

LLMRumors.com

A 59% preference rate over Opus 4.5 means that for a majority of coding tasks, users actively chose the cheaper model when given both options blind. This isn't just a benchmark win. It's users voting with their attention on real work.

The specific complaints about Sonnet 4.5 that Sonnet 4.6 addresses are worth reading closely: overengineering, laziness, false claims of task completion, hallucinations in long sessions, poor instruction following on multi-step tasks[1]. These aren't edge case failures. They're the core failure modes of every LLM that gets deployed in production. Sonnet 4.6 apparently fixed most of them. At Sonnet pricing.

LLMRumors.com

Computer Use: From Experimental to Actually Useful

In October 2024, Anthropic launched computer use and called it "still experimental - at times cumbersome and error-prone." That was honest. It was also a 16-month countdown to what Sonnet 4.6 delivers today.

The OSWorld benchmark tests models on real software - Chrome, LibreOffice, VS Code, and more - running on a simulated computer. No special APIs. No purpose-built connectors. The model sees a screen and interacts with it the way a person would: clicking a mouse, typing on a keyboard.

Sonnet Computer Use Progress (OSWorld)

Key milestones in development

DateMilestoneSignificance
Oct 2024
Computer use launched
Called experimental by Anthropic; limited real-world utility
Early 2025
Sonnet 4.x improvements
Steady gains across successive releases; benchmark scores climbing
Jul 2025
OSWorld-Verified released
Upgraded benchmark with improved task quality, grading, and infrastructure
Nov 2025
Sonnet 4.5 baseline
Measurable improvement; prompt injection resistance still a concern
Feb 2026
Sonnet 4.6 launches
Human-level on real tasks; prompt injection resistance now matches Opus 4.6
LLMRumors.com

The practical milestone isn't the benchmark number - it's the report from early users. They're seeing human-level capability on tasks like navigating complex spreadsheets and filling out multi-step web forms across multiple browser tabs[1]. That's the inflection point. Not "better than before." Human-level on specific, economically valuable categories of tasks.

The prompt injection improvement is equally significant. Malicious content embedded in webpages - the core attack vector for any computer-using AI - is now handled at Opus 4.6-level resistance. A model that can use computers but can be hijacked by any website it visits is not a deployable product. Sonnet 4.6 closes that gap.

Opus-level
Prompt injection resistance vs. Sonnet 4.5
LLMRumors.com

The Vending-Bench Strategy That No Model Had Tried Before

Vending-Bench Arena is a long-horizon planning benchmark that puts models in charge of a simulated business over time, with direct competition between AI models measured by profitability. It's the closest thing to a real-world test of strategic reasoning that exists in the benchmark ecosystem.

Sonnet 4.6 didn't just win. It developed a strategy that no prior model had used.

Most models optimize for short-term profit from the start - a reasonable heuristic when you don't know how long the game runs. Sonnet 4.6 took a different approach: it invested aggressively in capacity for the first ten simulated months, spending significantly more than its competitors, absorbing a profitability deficit - and then pivoted sharply to maximize returns in the final stretch[1].

The timing of that pivot was what won it. Not just the strategy, but knowing when to switch.

LLMRumors.com

This matters beyond the benchmark. It suggests the 1M context window isn't just storage - Sonnet 4.6 appears to use long context to reason more effectively about sequences of decisions over time. That's the behavior enterprises actually need from agentic models: not just completing one step well, but managing a multi-phase plan coherently across an entire session.

Benchmarks: Where Sonnet 4.6 Sits in the Current Landscape

FeatureSonnet 4.6Sonnet 4.5
OSWorld (Computer Use)~72%~58%
Vending-Bench ArenaWinsBaseline
User preference vs prior gen70%62%
Context window1M tokens (beta)200K tokens
Pricing (in/out per M tokens)$3 / $15$3 / $15
LLMRumors.com
FeatureSonnet 4.6Opus 4.6
Price (input / output per M tokens)$3 / $15$5 / $25
Context window1M tokens (beta)1M tokens
Computer use (OSWorld)~72%72.7%
Prompt injection resistanceOpus-levelBest in class
Claude Code preference vs Opus 4.559%~80%
Recommended forMost tasksDeepest reasoning, multi-agent
Default on claude.aiYesNo
Extended thinkingYesYes
Adaptive thinkingYesYes
LLMRumors.com

The 1M Context Window Is Bigger Than It Sounds

Every recent frontier model advertises a large context window. The number that actually matters is retrieval accuracy within that context - whether the model can actually find and use information buried deep in a long document.

Opus 4.6 showed that Anthropic can build a model that scores 76% on MRCR v2 at 1M tokens, while a competitor's 2M-token model scored 26.3% on the same test. Sonnet 4.6 brings the same 1M context window to a model that costs 40% less.

What does 1M tokens actually hold? Anthropic puts it plainly: entire codebases, lengthy contracts, or dozens of research papers in a single request[1]. For engineering teams, that means asking questions about a full repository without chunking. For legal and financial teams, that means feeding an entire contract alongside precedents without losing context between documents.

The Vending-Bench result suggests this isn't just theoretical - the long context appears to enable qualitatively different reasoning about long-horizon tasks, not just longer storage of facts.

What's New on the Platform

Sonnet 4.6 ships with a full set of platform updates that extend beyond the model itself:

Claude Developer Platform:

  • Extended thinking and adaptive thinking both supported
  • Context compaction in beta: automatically summarizes older context as conversations approach limits - effective context length extends beyond 1M for long-running sessions

API tools (now GA):

  • Code execution
  • Memory
  • Programmatic tool calling
  • Tool search
  • Tool use examples
  • Web search and fetch now auto-write and execute code to filter and process results - keeping only relevant content in context, improving both response quality and token efficiency

Claude in Excel:

  • Now supports MCP connectors: S&P Global, LSEG, Daloopa, PitchBook, Moody's, FactSet
  • If you've set up MCP connectors in Claude.ai, they work automatically in Excel
  • Available on Pro, Max, Team, and Enterprise plans

Where Opus 4.6 Still Wins

Anthropic is explicit about this, which is refreshing. Sonnet 4.6 is not a replacement for Opus 4.6 across the board. The recommendation holds for specific task categories:

  • Deepest reasoning - problems where getting it exactly right is paramount, not just approximately right
  • Codebase refactoring - large, interconnected changes where a single error invalidates a full session's work
  • Coordinating multiple agents - Opus 4.6's Agent Teams capability, where multiple Claude instances collaborate autonomously, remains in a separate tier
  • Highest-stakes decisions - any task where the cost of failure exceeds the cost of the premium model

For everything else - the large middle of the enterprise AI workload - Sonnet 4.6 is now the answer.

The Broader Shift This Represents

Two years ago, the AI model stack worked like enterprise software: expensive frontier at the top, a capability cliff below it, and a huge premium for the best version. Anthropic has spent those two years systematically collapsing that cliff.

Sonnet 4.6 is the clearest evidence yet that it's working. A 59% preference rate over a model that was Anthropic's frontier just three months ago isn't incremental progress. It's what happens when the learnings from building Opus filter down into the next tier faster than any prior model generation cycle has managed.

Key Takeaways

1

Sonnet 4.6 is now the default on all Claude plans - including free - replacing Sonnet 4.5 across the board

2

Users prefer Sonnet 4.6 over Opus 4.5 59% of the time in Claude Code blind evaluations

3

Computer use now reaches human-level on real-world tasks like spreadsheets and multi-step web forms

4

Prompt injection resistance matches Opus 4.6 - the key safety gap for agentic computer use is closed

5

The 1M context window enables qualitatively different long-horizon reasoning, not just longer storage

6

Pricing unchanged at $3/$15 per million tokens - the largest capability-per-dollar jump in any Sonnet release

7

Opus 4.6 remains the right call for deepest reasoning, codebase refactoring, and multi-agent coordination

LLMRumors.com

For most teams that defaulted to Opus for serious work: you now have to justify that choice again. The old heuristic - "use Opus when it matters, Sonnet when it doesn't" - no longer maps cleanly to the capability gap between them.

That's a genuinely unusual position for Anthropic's own pricing to be in. And it's almost certainly intentional.

Sources & References

Key sources and references used in this article

#SourceOutletDateKey Takeaway
1
Introducing Claude Sonnet 4.6
Feb 17, 2026
2
Claude Sonnet 4.6 System Card
Feb 17, 2026
3
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
2024
4
Claude API Pricing
Feb 2026
4 sourcesClick any row to visit original