CVE-Bench: testing LLM agents on real

Source: News.Ycombinator

Published:

<p>In early 2026, Anthropic claimed Mythos – one of their latest models – finds security vulnerabilities better than human experts. Yet, the number of security vulnerabilities keeps rising anyway.</p> <p>I wanted to test how well models do in fixing vulnerabilities. Poolside’s Laguna models arrived

Read original article