Snyk VulnBench JS 1.0 Reveals Inconsistencies in LLM Security Findings

Snyk VulnBench JS 1.0 Reveals Inconsistencies in LLM Security Findings

First seen 30 Jun 2026, 15:13 UTC Snyk 100% similarity 39.9
Share:

Article Content

Browse articles
ThreatCluster

Snyk conducted 300 vulnerability scans to evaluate the repeatability of LLM security reviews on identical code and prompts. The results showed that while reference-matched findings were stable, extra-model reports varied significantly. Out of 161 unique unmatched findings, 80 appeared only once across five identical scans, while 134 of 158 matched findings were consistent across all repetitions. The benchmark highlighted that LLMs can identify high-signal exploit shapes but also produce inconsistent results. The highest-recall LLM configuration detected only 81% of Snyk Code reference vulnerabilities, with nearly 50% of LLM-only reports appearing in just one of five scans. This raises questions about the reliability of LLMs in security assessments compared to traditional deterministic SAST tools. The benchmark was designed to measure model behavior under controlled conditions using JavaScript and Express applications.

Key Points: • LLM security findings show significant variability, with 50% appearing only once in scans. • Reference-matched findings were stable, indicating a need for combining LLMs with traditional SAST tools. • The highest-recall LLM configuration found only 81% of Snyk Code reference vulnerabilities.

ThreatCluster AI

Timeline

2026-06-29
Snyk VulnBench JS 1.0 results published
Snyk released findings from 300 scans showing inconsistencies in LLM security reviews, highlighting the variability in extra-model reports.
Snyk

Community

Browse all →