CVE-Bench: testing LLM agents on real
Source: News.Ycombinator
Published:
<p>In early 2026, Anthropic claimed Mythos – one of their latest models – finds security vulnerabilities better than human experts. Yet, the number of security vulnerabilities keeps rising anyway.</p> <p>I wanted to test how well models do in fixing vulnerabilities. Poolside’s Laguna models arrived