
🔐 When a Security-Focused LLM Scans 50+ Cloudflare Repositories#
Cloudflare participated in Anthropic’s Project Glasswing, testing Mythos Preview — a frontier cybersecurity model — on more than 50 of their own repositories. The results are revealing. 🕵️
🚀 What Makes Mythos Preview Different?#
Two capabilities stand out versus general-purpose models:
- Exploit chain construction — doesn’t just find isolated bugs, it chains them into working exploits, like a senior researcher would
- Proof generation — writes PoC code, compiles it, runs it, and if it fails, adjusts the hypothesis and retries
🏗️ The Harness: The Key to Success#
Cloudflare didn’t just point a generic agent at the repository — they built an 8-stage harness:
| Stage | Function |
|---|---|
| Recon | Generates architecture document and attack surface |
| Hunt | ~50 parallel agents, each with a specific bug class |
| Validate | Independent agent tries to disprove the finding |
| Gapfill | Re-queues areas with insufficient coverage |
| Dedupe | Collapses findings with the same root cause |
| Trace | Verifies if the bug is reachable from external input |
| Feedback | Feeds new tasks back into the pipeline |
| Report | Structured output, not free-form prose |
💡 Explanation in a nutshell#
Project Glasswing is the first detailed public report of how a specialized security LLM (Anthropic’s Mythos Preview) works at scale on real production infrastructure. The most important insight isn’t that the model is smart — it’s that the architecture around the model determines success. A generic agent pointed at a repository produces noise. An 8-stage harness with 50 parallel agents, adversarial validation, and reach traceability turns speculative findings into actionable vulnerabilities. This is the future of both offensive and defensive security.
More information at the link 👇

