TL;DR: Claude Sonnet 4.6, released February 17, 2026, is now the default model on claude.ai and Claude Cowork - and it just made Opus-class performance available at Sonnet pricing. Users in early access preferred Sonnet 4.6 over Sonnet 4.5 70% of the time and over Opus 4.5 59% of the time[1]. It ships with a 1M token context window in beta, leads all Sonnet models on OSWorld computer use, and developed a novel business strategy on Vending-Bench Arena that no prior model had tried. Pricing stays flat at $3/$15 per million tokens. This is the biggest gap between capability and price that Anthropic has ever released.
There's a story Anthropic has been telling for two years: that intelligence should cascade down the model family, not stay locked at the frontier tier. Every Sonnet release is a test of whether that story is true. Claude Sonnet 4.6 is the first time it feels genuinely proven.
Twelve days ago, Opus 4.6 launched as the most capable model in Anthropic's history - the model that dethroned incumbents on agentic benchmarks, rewrote enterprise market share numbers, and commanded $5/$25 per million tokens as the price of admission. Today, Sonnet 4.6 does a significant portion of what Opus does, at 60% less cost, and it's now free for every user on every plan.
That's not a product update. That's a statement about where the frontier actually lives.
Why This Matters Now
Sonnet 4.6 is now the default model across all Claude plans, Claude Cowork, Claude Code, and the API - including the free tier. Anthropic upgraded free users to Sonnet 4.6 and added file creation, connectors, skills, and compaction to that tier simultaneously[1]. The gap between "free Claude" and "Claude that matters" just got a lot smaller.
The Preference Numbers That Should Worry Anthropic's Pricing Team
Anthropic ran head-to-head preference evaluations in Claude Code - one of the most demanding real-world environments for a model, where errors compound over long sessions and instruction following is tested repeatedly across a single context.
The results are stark.
By The Numbers
User preference rate in Claude Code evaluations
Users chose Sonnet 4.6 over the previous frontier model
Beta; equivalent to entire codebases or dozens of research papers
Per million tokens in/out - unchanged from Sonnet 4.5
Continuous benchmark gains since first computer use release in Oct 2024
Major improvement vs Sonnet 4.5; now matches Opus 4.6 on safety evals
A 59% preference rate over Opus 4.5 means that for a majority of coding tasks, users actively chose the cheaper model when given both options blind. This isn't just a benchmark win. It's users voting with their attention on real work.
The specific complaints about Sonnet 4.5 that Sonnet 4.6 addresses are worth reading closely: overengineering, laziness, false claims of task completion, hallucinations in long sessions, poor instruction following on multi-step tasks[1]. These aren't edge case failures. They're the core failure modes of every LLM that gets deployed in production. Sonnet 4.6 apparently fixed most of them. At Sonnet pricing.
Computer Use: From Experimental to Actually Useful
In October 2024, Anthropic launched computer use and called it "still experimental - at times cumbersome and error-prone." That was honest. It was also a 16-month countdown to what Sonnet 4.6 delivers today.
The OSWorld benchmark tests models on real software - Chrome, LibreOffice, VS Code, and more - running on a simulated computer. No special APIs. No purpose-built connectors. The model sees a screen and interacts with it the way a person would: clicking a mouse, typing on a keyboard.
Sonnet Computer Use Progress (OSWorld)
Key milestones in development
| Date | Milestone | Significance |
|---|---|---|
| Oct 2024 | Computer use launched | Called experimental by Anthropic; limited real-world utility |
| Early 2025 | Sonnet 4.x improvements | Steady gains across successive releases; benchmark scores climbing |
| Jul 2025 | OSWorld-Verified released | Upgraded benchmark with improved task quality, grading, and infrastructure |
| Nov 2025 | Sonnet 4.5 baseline | Measurable improvement; prompt injection resistance still a concern |
| Feb 2026 | Sonnet 4.6 launches | Human-level on real tasks; prompt injection resistance now matches Opus 4.6 |
The practical milestone isn't the benchmark number - it's the report from early users. They're seeing human-level capability on tasks like navigating complex spreadsheets and filling out multi-step web forms across multiple browser tabs[1]. That's the inflection point. Not "better than before." Human-level on specific, economically valuable categories of tasks.
The prompt injection improvement is equally significant. Malicious content embedded in webpages - the core attack vector for any computer-using AI - is now handled at Opus 4.6-level resistance. A model that can use computers but can be hijacked by any website it visits is not a deployable product. Sonnet 4.6 closes that gap.
The Vending-Bench Strategy That No Model Had Tried Before
Vending-Bench Arena is a long-horizon planning benchmark that puts models in charge of a simulated business over time, with direct competition between AI models measured by profitability. It's the closest thing to a real-world test of strategic reasoning that exists in the benchmark ecosystem.
Sonnet 4.6 didn't just win. It developed a strategy that no prior model had used.
Most models optimize for short-term profit from the start - a reasonable heuristic when you don't know how long the game runs. Sonnet 4.6 took a different approach: it invested aggressively in capacity for the first ten simulated months, spending significantly more than its competitors, absorbing a profitability deficit - and then pivoted sharply to maximize returns in the final stretch[1].
The timing of that pivot was what won it. Not just the strategy, but knowing when to switch.
This matters beyond the benchmark. It suggests the 1M context window isn't just storage - Sonnet 4.6 appears to use long context to reason more effectively about sequences of decisions over time. That's the behavior enterprises actually need from agentic models: not just completing one step well, but managing a multi-phase plan coherently across an entire session.
Benchmarks: Where Sonnet 4.6 Sits in the Current Landscape
| Feature | Sonnet 4.6 | Sonnet 4.5 |
|---|---|---|
| OSWorld (Computer Use) | ~72% | ~58% |
| Vending-Bench Arena | Wins | Baseline |
| User preference vs prior gen | 70% | 62% |
| Context window | 1M tokens (beta) | 200K tokens |
| Pricing (in/out per M tokens) | $3 / $15 | $3 / $15 |
| Feature | Sonnet 4.6 | Opus 4.6 |
|---|---|---|
| Price (input / output per M tokens) | $3 / $15 | $5 / $25 |
| Context window | 1M tokens (beta) | 1M tokens |
| Computer use (OSWorld) | ~72% | 72.7% |
| Prompt injection resistance | Opus-level | Best in class |
| Claude Code preference vs Opus 4.5 | 59% | ~80% |
| Recommended for | Most tasks | Deepest reasoning, multi-agent |
| Default on claude.ai | Yes | No |
| Extended thinking | Yes | Yes |
| Adaptive thinking | Yes | Yes |
The 1M Context Window Is Bigger Than It Sounds
Every recent frontier model advertises a large context window. The number that actually matters is retrieval accuracy within that context - whether the model can actually find and use information buried deep in a long document.
Opus 4.6 showed that Anthropic can build a model that scores 76% on MRCR v2 at 1M tokens, while a competitor's 2M-token model scored 26.3% on the same test. Sonnet 4.6 brings the same 1M context window to a model that costs 40% less.
What does 1M tokens actually hold? Anthropic puts it plainly: entire codebases, lengthy contracts, or dozens of research papers in a single request[1]. For engineering teams, that means asking questions about a full repository without chunking. For legal and financial teams, that means feeding an entire contract alongside precedents without losing context between documents.
The Vending-Bench result suggests this isn't just theoretical - the long context appears to enable qualitatively different reasoning about long-horizon tasks, not just longer storage of facts.
What's New on the Platform
Sonnet 4.6 ships with a full set of platform updates that extend beyond the model itself:
Claude Developer Platform:
- Extended thinking and adaptive thinking both supported
- Context compaction in beta: automatically summarizes older context as conversations approach limits - effective context length extends beyond 1M for long-running sessions
API tools (now GA):
- Code execution
- Memory
- Programmatic tool calling
- Tool search
- Tool use examples
- Web search and fetch now auto-write and execute code to filter and process results - keeping only relevant content in context, improving both response quality and token efficiency
Claude in Excel:
- Now supports MCP connectors: S&P Global, LSEG, Daloopa, PitchBook, Moody's, FactSet
- If you've set up MCP connectors in Claude.ai, they work automatically in Excel
- Available on Pro, Max, Team, and Enterprise plans
Where Opus 4.6 Still Wins
Anthropic is explicit about this, which is refreshing. Sonnet 4.6 is not a replacement for Opus 4.6 across the board. The recommendation holds for specific task categories:
- Deepest reasoning - problems where getting it exactly right is paramount, not just approximately right
- Codebase refactoring - large, interconnected changes where a single error invalidates a full session's work
- Coordinating multiple agents - Opus 4.6's Agent Teams capability, where multiple Claude instances collaborate autonomously, remains in a separate tier
- Highest-stakes decisions - any task where the cost of failure exceeds the cost of the premium model
For everything else - the large middle of the enterprise AI workload - Sonnet 4.6 is now the answer.
The Broader Shift This Represents
Two years ago, the AI model stack worked like enterprise software: expensive frontier at the top, a capability cliff below it, and a huge premium for the best version. Anthropic has spent those two years systematically collapsing that cliff.
Sonnet 4.6 is the clearest evidence yet that it's working. A 59% preference rate over a model that was Anthropic's frontier just three months ago isn't incremental progress. It's what happens when the learnings from building Opus filter down into the next tier faster than any prior model generation cycle has managed.
Key Takeaways
Sonnet 4.6 is now the default on all Claude plans - including free - replacing Sonnet 4.5 across the board
Users prefer Sonnet 4.6 over Opus 4.5 59% of the time in Claude Code blind evaluations
Computer use now reaches human-level on real-world tasks like spreadsheets and multi-step web forms
Prompt injection resistance matches Opus 4.6 - the key safety gap for agentic computer use is closed
The 1M context window enables qualitatively different long-horizon reasoning, not just longer storage
Pricing unchanged at $3/$15 per million tokens - the largest capability-per-dollar jump in any Sonnet release
Opus 4.6 remains the right call for deepest reasoning, codebase refactoring, and multi-agent coordination
For most teams that defaulted to Opus for serious work: you now have to justify that choice again. The old heuristic - "use Opus when it matters, Sonnet when it doesn't" - no longer maps cleanly to the capability gap between them.
That's a genuinely unusual position for Anthropic's own pricing to be in. And it's almost certainly intentional.
Sources & References
Key sources and references used in this article
| # | Source | Outlet | Date | Key Takeaway |
|---|---|---|---|---|
| 1 | Introducing Claude Sonnet 4.6 | Feb 17, 2026 | ||
| 2 | Claude Sonnet 4.6 System Card | Feb 17, 2026 | ||
| 3 | OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | 2024 | ||
| 4 | Claude API Pricing | Feb 2026 |




