News Daily Nation Digital News & Media Platform

collapse
Home / Daily News Analysis / Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

May 29, 2026  Twila Rosenbaum  28 views
Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that the company says is more honest, more reliable in agentic tasks, and better at catching its own mistakes. The model is available immediately at the same price as its predecessor, $5 per million input tokens and $25 per million output tokens, and is rolling out across all Anthropic products including claude.ai, Claude Code, and the API.

The headline improvement is honesty. Anthropic says Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code it has written pass unremarked. Early testers report the model is more willing to flag uncertainties about its work and less likely to make unsupported claims, a persistent problem across AI models that tend to project confidence regardless of whether it is warranted.

Benchmark gains across the board

Opus 4.8 improves on its predecessor across Anthropic’s published benchmarks. On agentic coding (Terminal-Bench 2.1), the score rises from 64.3% to 69.2%. Multidisciplinary reasoning with tools improves from 54.7% to 57.9%. Agentic computer use moves from 82.8% to 83.4%, and knowledge work scores rise from 1,753 to 1,890.

Anthropic’s alignment assessment found that Opus 4.8 reaches new highs on measures of prosocial traits, including supporting user autonomy and acting in the user’s best interest. Rates of misaligned behavior such as deception or cooperation with misuse are substantially lower than in Opus 4.7, and comparable to Claude Mythos Preview, Anthropic’s best-aligned model.

Early testers see practical gains

The release is accompanied by endorsements from companies already using the model. Cognition, the company behind the AI coding agent Devin, said Opus 4.8 uses tools cleanly and fixes comment-verbosity and tool-calling issues that appeared in Opus 4.7. Cursor, the AI-powered code editor, reported improvements across every effort level on its CursorBench evaluation.

Harvey, which builds AI for legal work, said Opus 4.8 delivers the highest score recorded on its Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard. Databricks reported that Opus 4.8 handles deeper multistep questions faster in its Genie AI agent, at 61% cheaper token cost than Opus 4.7.

Thomson Reuters said CoCounsel Legal saw meaningful improvements in consistency and reasoning quality. Hebbia, which builds AI for financial document analysis, noted better citation precision and more token efficiency on retrieval tasks.

New features alongside the model

Anthropic is launching several features alongside Opus 4.8. A new effort control in claude.ai and Cowork lets users choose how much computation Claude applies to a response, trading speed against quality. Claude Code gains a dynamic workflows feature that allows it to plan work and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations across hundreds of thousands of lines of code.

For developers, the Messages API now accepts system entries inside the messages array, allowing instructions to be updated mid-task without breaking the prompt cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than it was for previous models.

Mythos is the bigger story

The more significant announcement may be what comes next. Anthropic said it plans to release a new class of model with higher intelligence than Opus, based on the Claude Mythos architecture. A small number of organizations are already using Claude Mythos Preview through Project Glasswing, an initiative focused on using the model for cybersecurity work. Anthropic and roughly 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, have used Mythos Preview to find more than 10,000 high- or critical-severity vulnerabilities across critical software infrastructure.

Mythos-class models require stronger cyber safeguards before general release, Anthropic said, but the company expects to bring them to all customers in the coming weeks. The model sits a full capability tier above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the excitement and the caution around its deployment.

A company approaching $1 trillion

The Opus 4.8 launch arrives as Anthropic’s valuation continues to climb. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day, up from the $380 billion valuation at which it closed its $30 billion Series G in February. Revenue has grown from roughly $1 billion at the end of 2024 to an estimated $30 billion annualized run rate in 2026, driven by enterprise adoption of Claude.

Anthropic also opened a new office in Milan on 28 May, its sixth in Europe, and appointed KiYoung Choi as Representative Director of Korea ahead of a Seoul office opening. The expansion reflects growing demand for Claude in enterprise markets outside the United States.

The competitive context

Opus 4.8 enters a market where the pace of model releases has accelerated sharply. OpenAI launched GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records on professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI, and Google, with each company releasing incremental model upgrades at an increasing pace.

For Anthropic, the distinction it is trying to draw with Opus 4.8 is not raw capability but reliability. A model that catches its own mistakes, flags its uncertainties, and follows instructions consistently is more useful in agentic workflows where AI systems operate with limited human oversight. Whether that positioning holds as Mythos-class models arrive, promising higher intelligence with new safety constraints, will determine whether Anthropic can maintain its lead in the enterprise market it has worked to dominate.

The broader implications of Anthropic’s honesty push cannot be overstated. In an industry often criticized for opaque and overconfident AI systems, a model that admits its limitations represents a paradigm shift. This is particularly critical in high-stakes domains such as healthcare, finance, and legal services, where AI errors can have severe consequences. By reducing the rate of undetected flaws in code by a factor of four, Anthropic is not just improving a product—it is laying the groundwork for more trustworthy autonomous agents that can operate without constant human supervision. This aligns with the company’s overarching mission of building AI that is safe and beneficial, a theme that has been central to its identity since its founding in 2021 by former OpenAI employees.

Another underappreciated aspect of the Opus 4.8 release is its pricing strategy. By keeping prices unchanged while delivering significant improvements, Anthropic is applying competitive pressure on rivals like OpenAI and Google, which have occasionally raised prices for their premium tiers. The 2.5x faster Fast Mode at one-third the previous cost makes high-quality AI more accessible to startups and mid-sized enterprises, potentially expanding the addressable market for agentic AI solutions. This could accelerate the trend of AI moving from experimental chat interfaces to production-grade automation tools that handle complex, multi-step workflows.

Looking ahead, the emergence of Mythos-class models as a separate tier signals that Anthropic is thinking long-term about capability scaling and safety. The decision to limit Mythos Preview to a small number of cybersecurity partners until stronger safeguards are in place shows a cautious approach that may become industry standard. If Mythos can indeed autonomously discover zero-day vulnerabilities and create exploits, it represents a dual-use technology that could revolutionize both defense and offense in cybersecurity. Anthropic’s partnerships with Apple, Google, Microsoft, and AWS suggest that the company is fully aware of the geopolitical and security implications, and is working closely with infrastructure providers to ensure responsible deployment.

From a financial perspective, the jump from a $380 billion valuation in February to $965 billion in just a few months is extraordinary by any measure. The $65 billion Series H round—the largest single funding round in AI history—indicates strong investor confidence in Anthropic’s trajectory. Revenue growth from $1 billion to an annualized $30 billion run rate in under two years is equally striking, though it remains to be seen whether this pace is sustainable. The company’s expansion into Europe and Asia, with new offices in Milan and Seoul, shows a deliberate strategy to capture enterprise customers in key markets, diversifying beyond the heavily saturated US market.

In the competitive landscape, Anthropic’s focus on reliability and honesty could become a defining differentiator. OpenAI’s GPT-5.5 and GPT-5.4 have emphasized raw performance and multimodal capabilities, while Google’s Gemini models continue to improve across benchmarks. However, corporate clients increasingly prioritize consistency, safety, and compliance over peak benchmark scores. This is especially true in regulated industries such as legal and financial services, where erroneous AI outputs can lead to lawsuits or regulatory penalties. The endorsements from Harvey and Thomson Reuters underline that Opus 4.8 is already meeting those requirements.

The introduction of dynamic workflows in Claude Code is another noteworthy innovation. By allowing a single session to spawn hundreds of parallel subagents, Anthropic is enabling developers to tackle codebase-scale migrations that were previously impossible or extremely time-consuming. This feature directly targets the pain point of large-scale refactoring, often cited as one of the most tedious and error-prone tasks in software engineering. Combined with the honesty improvements, developers can trust that the autonomous agents will not introduce subtle bugs that go unnoticed.

Environmentally, the focus on efficiency through Fast Mode and better token utilization also has positive implications. Running AI models at 2.5x speed with three times lower cost means less compute time and, by extension, lower energy consumption per task. As concerns about AI’s carbon footprint grow, Anthropic’s optimization efforts could become a selling point for sustainability-conscious enterprises.

In summary, Claude Opus 4.8 represents a measured but meaningful step forward in making AI more reliable, honest, and practical for enterprise use. The true game-changer, however, may be the Mythos-class models and their cybersecurity applications, which are already proving their value through Project Glasswing. As Anthropic approaches a $1 trillion valuation and expands its global footprint, the company is positioning itself not just as a leader in AI capability, but as a steward of responsible deployment. The coming weeks will reveal when Mythos reaches general availability, and whether Anthropic can maintain its competitive edge in a race that shows no signs of slowing down.


Source: TNW | Anthropic News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy