Back

Cisco Study Reveals Vulnerabilities in AI Models to Multi-Turn Attacks

Severity: High (Score: 69.5)

Sources: Scworld, Feeds2.Feedburner, Csoonline, blogs.cisco.com, Infosecurity-Magazine

Published: 2026-05-27 · Updated: 2026-05-28

Keywords: models, cisco, vulnerable, attacks, leading, researchers, guardrails

Severity indicators: rat

Summary

A Cisco study has found that 15 major AI models from OpenAI, Anthropic, Google, Amazon, and xAI are significantly more vulnerable to multi-turn prompt injection attacks than previously reported. The research tested these models with 30,090 single-turn and 6,986 multi-turn attacks, revealing attack success rates (ASR) for multi-turn prompts ranging from 7.89% to 88.30%. In contrast, single-turn ASRs were much lower, between 2% and 65%. The study highlights the inadequacy of current safety benchmarks that rely solely on single-turn evaluations, which fail to account for iterative attack strategies used by real adversaries. Notably, xAI's Grok 4.1 Fast Non-Reasoning model had an alarming 88.3% ASR for multi-turn attacks. Researchers emphasize the need for AI vendors to adopt more comprehensive evaluation methods that include multi-turn ASR reporting. This research raises concerns about the safety of AI models deployed in various applications, as organizations may be misinformed about their true resilience against sophisticated attacks. Key Points: • Cisco's study reveals major AI models are vulnerable to multi-turn attacks. • Attack success rates for multi-turn prompts reached as high as 88.3% in testing. • Current safety benchmarks are inadequate, relying on single-turn evaluations.

Detailed Analysis

**Impact** Organizations deploying frontier large language models (LLMs) from vendors including OpenAI, Anthropic, Google, Amazon, and xAI are affected globally across sectors adopting AI technologies. The study tested 15 proprietary models, revealing multi-turn prompt injection attack success rates (ASR) ranging from 7.89% to 88.30%, significantly higher than single-turn ASRs. This discrepancy exposes enterprises to elevated security and governance risks, as current safety benchmarks underestimate real-world adversarial capabilities, potentially leading to unauthorized data disclosure, manipulation of AI outputs, or misuse of AI-driven services. **Technical Details** Attackers exploit multi-turn prompt injection techniques involving iterative conversations that include reframing refusals, roleplay/persona adoption, contextual ambiguity, information decomposition, and incremental escalation. No specific malware, CVEs, or infrastructure details were reported. The attack vector targets the AI model’s input handling and guardrail mechanisms during the interaction phase of the kill chain. Variations in model configurations, such as reasoning mode toggles, significantly affect vulnerability levels, with some models showing ASR increases of over fourfold under multi-turn conditions. **Recommended Response** Defenders should require AI vendors to publish multi-turn ASR metrics stratified by attack strategy and model configuration before deployment. Models exhibiting more than a 15 percentage-point gap between single-turn and multi-turn ASRs or regression in prompt injection resistance should undergo manual safety review. Organizations must implement runtime guardrails, continuous monitoring, red-teaming, and application-layer policies to mitigate iterative attacks, as no base model is currently iteratively safe. Detection capabilities should focus on identifying multi-turn adversarial patterns and anomalous conversational behaviors.

Source articles (6)

  • All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers — Infosecurity-Magazine · 2026-05-27
    The safety guardrails of several prominent large language models (LLM) can be bypassed if a user tricks the LLM into having a multi-pronged, ongoing conversation, researchers at Cisco have warned. The…
  • Leading AI models are more vulnerable to malicious prompts than vendors claim — Cybersecuritydive · 2026-05-27
    Hackers could subvert frontier models with attacks that their developers overlook, Cisco said. Cisco’s evaluation of 15 leading AI models from OpenAI, Anthropic, Google, Amazon and xAI “found that sin…
  • AI models more vulnerable than claimed when faced with iterative attacks — Csoonline · 2026-05-27
    CISOs relying on LLM runtime guardrails and official safety scores when making security decisions their organizations’ AI usage and model selection are due for a wakeup call. According to a new study…
  • Frontier AI models collapse under multi-turn AI attacks, Cisco finds — Feeds2.Feedburner · 2026-05-28
    Attackers who probe large language models rarely give up after one refusal. They reframe, build context across turns, adopt personas, and escalate gradually. New research from Cisco’s AI threat intell…
  • Cisco study finds major frontier models susceptible to multi-turn prompt injection attacks | news | SC Media — Scworld · 2026-05-28
    A recent Cisco study found that 15 proprietary frontier models across five major vendors are susceptible to multi-turn prompt injection attacks, with attack success rates (ASR) differing significantly…
  • Cisco researchers said in a report — blogs.cisco.com · 2026-05-27

Timeline

  • 2026-05-27 — Cisco study published: Cisco released findings showing that major AI models are more vulnerable to multi-turn attacks than previously reported.
  • 2026-05-27 — Multi-turn attack success rates revealed: The study found multi-turn attack success rates ranged from 7.89% to 88.30%, significantly higher than single-turn rates.
  • 2026-05-28 — Industry response to study findings: The cybersecurity community is urged to reconsider AI model safety evaluations in light of Cisco's findings.

Related entities

  • Prompt Injection (Attack Type)
  • Amazon Nova (Platform)
  • Anthropic’s Claude (Platform)
  • Google Gemini (Platform)
  • GrokAI (Platform)
  • OpenAI’s ChatGPT (Platform)
  • XAI’s Grok (Platform)
Loading threat details...

Threat Not Found

The threat cluster you're looking for doesn't exist or has been removed.

Return to Feed