---
title: Claude Mythos Found a 27-Year-Old Bug. It Cost $20,000 and 1,000 Tries.
description: Anthropic says Claude Mythos found thousands of zero-days. The numbers show 198 manual reviews and roughly $20,000 for the headline OpenBSD bug.
author: Darie Nani (Editor-in-Chief)
date: 2026-04-14T23:02:04.417Z
updated: 2026-06-25T09:44:40.033Z
canonical: https://www.sovereignmagazine.com/article/claude-mythos-27-year-bug-cost
image: https://cdn.nanimediahouse.com/claude-mythos-featured.webp
categories: Artificial Intelligence
content_type: News
region: Global
publication: Sovereign Magazine
about:
  - type: Organization
    name: Anthropic
---

Anthropic spent the last week pitching [Claude Mythos](https://www.sovereignmagazine.com/article/claude-mythos-anthropic-new-model) as [the model that cracked cybersecurity wide open](https://www.sovereignmagazine.com/article/anthropic-alibaba-distillation-export-ban). The centerpiece of that pitch is a 27-year-old vulnerability in OpenBSD, one of the most security-hardened operating systems in the world. The company published the finding on a new subdomain, red.anthropic.com, under a program it is calling Project Glasswing. The framing is unmistakable: an AI system has reached inside critical infrastructure and found what a quarter-century of human review did not.

The finding is real. The framing is not quite the story.

Mythos found the OpenBSD bug, a denial-of-service flaw in the TCP SACK implementation, after roughly 1,000 runs through Anthropic's internal scaffold. That batch cost just under $20,000 in compute. The FFmpeg discovery, a 16-year-old H.264 slice boundary bug, came out of several hundred runs and cost roughly $10,000. For the Linux kernel work, Anthropic ran "several thousand scans over the repository." None of these numbers are hidden. They are in Anthropic's own write-up. They just tend to fall out of the headlines.

## The economics of "finding" a bug

If you run a frontier model over a mature codebase a thousand times and pay for the privilege, you are going to surface something. That is not a knock on the model. It is the basic math of probabilistic search. A capable reasoning system pointed at millions of lines of aged C code, with the budget to try every angle it can imagine, will eventually land on a real flaw. The question is what that finding actually tells you about the tool.

The OpenBSD headline is the cleanest example. The bug survived 27 years because it is genuinely hard to spot, and Mythos still needed about $50 of compute on the specific winning run, sitting inside a $20,000 campaign. A defender reading the press release gets the first number. A CISO budgeting for an internal program needs the second.

## Where "thousands" comes from

The biggest claim in the Mythos preview is that it found thousands of high- and critical-severity vulnerabilities across every major operating system and every major web browser. That number is an extrapolation. Anthropic's expert contractors manually reviewed 198 reports. On those 198, the contractors agreed with Mythos's severity assessment about 90 percent of the time. Anthropic then scaled that agreement rate across the full output and arrived at "thousands."

A 90 percent agreement rate on 198 samples is a respectable signal about classification accuracy. It is not the same as 1,000 confirmed critical vulnerabilities. [Tom's Hardware](https://www.tomshardware.com/tech-industry/artificial-intelligence/anthropics-claude-mythos-isnt-a-sentient-super-hacker-its-a-sales-pitch-claims-of-thousands-of-severe-zero-days-rely-on-just-198-manual-reviews), reviewing the same data, counted roughly 10 severe vulnerabilities that look genuinely exploitable across more than 7,000 open-source stacks tested. [Red Hat](https://www.redhat.com/en/blog/navigating-mythos-haunted-world-platform-security), which reviewed a subset, described many of the reports as functionality flaws rather than security issues. Anthropic itself concedes the FFmpeg bug "would be challenging to turn into a functioning exploit." On the Linux kernel, Mythos was unable to actually exploit any of the bugs it flagged, because existing kernel defenses held.

None of that makes the research fake. It makes the research normal. A new method finds a lot of candidate issues, a small fraction survive review, and a smaller fraction still are weaponizable. That is how vulnerability research has always worked. What is new is the volume, the speed, and the price.

## The sales pitch sitting next to the science

The other thing worth naming is that Mythos is not being released through the usual model update channel. It is being previewed through a cybersecurity initiative, on a cybersecurity-branded subdomain, tied to a cybersecurity program with a code name. Anthropic repeatedly gestures at the question of whether Mythos might be in some sense conscious, a line of speculation that has nothing to do with its ability to find heap overflows. The effect is to wrap a capable tool in a narrative of unease, which is a very efficient way to sell it into defense, enterprise security, and government.

This is not a hidden move. OpenAI ran a version of the same play in 2019 with GPT-2, which it initially said was too dangerous to release. Project Glasswing lines Anthropic up to sell into a market where Google's Big Sleep already operates and where government buyers are actively looking for a defensive story they can point at.

## What it actually means for defenders

If you run a security program, the practical read is narrower than the headlines suggest. A frontier model can, at meaningful but not absurd cost, surface credible candidate bugs in hardened code at volumes no human team can match. The triage burden is real, the false positive and "functionality flaw" rate is real, and the gap between "flagged" and "weaponized" is still wide. None of that should be read as "thousands of zero-days" sitting in a literal inventory.

The 27-year-old OpenBSD bug is a genuine milestone. It is also a $20,000 milestone that took a thousand attempts, published by a company that has a product to sell. Both of those sentences are true, and the second one is the one the coverage keeps leaving out.

## FAQ

**Q: Did Claude Mythos really find a 27-year-old vulnerability?**
Yes. Mythos identified a denial-of-service flaw in OpenBSD's TCP SACK implementation that had been in the code for 27 years. The finding is verified. The context is that Anthropic ran roughly 1,000 passes through the codebase at a cost of just under $20,000 in compute before landing on it.

**Q: Can AI find vulnerabilities that humans cannot?**
It can find vulnerabilities humans have not found yet, which is not the same thing. The Mythos results show a frontier model can surface real bugs in mature, well-reviewed code, but only after many passes and significant compute. The capability is new. The economics still matter.

**Q: Are the "thousands of zero-days" from Claude Mythos all real?**
No. The "thousands" figure is a statistical extrapolation from 198 manually reviewed reports where expert contractors agreed with Mythos's severity rating about 90 percent of the time. Independent review by Red Hat and reporting by Tom's Hardware suggest only around 10 severe vulnerabilities across more than 7,000 tested stacks look clearly exploitable, and that many other reports describe functionality flaws rather than security issues.

**Q: Why is Anthropic announcing Claude Mythos this way?**
The model is being previewed through a cybersecurity program called Project Glasswing, on a dedicated red.anthropic.com subdomain, rather than through Anthropic's normal model release channel. That framing lines the company up to sell into defense, enterprise security, and government buyers, a market where Google's Big Sleep already operates.