AI Models Vulnerable to Multi-Turn Attacks, Cisco Study Reveals
Severity: High (Score: 64.5)
Sources: Infosecurity-Magazine, Csoonline, Cybersecuritydive, blogs.cisco.com
Published: · Updated:
Keywords: models, cisco, vulnerable, attacks, leading, researchers, guardrails
Severity indicators: rat
Summary
A Cisco study has found that leading AI models from OpenAI, Anthropic, Google, Amazon, and xAI are significantly more vulnerable to multi-turn attacks than previously claimed. The research evaluated 15 models and revealed that single-turn attack success rates (ASR) do not accurately represent their performance under iterative attacks. For instance, OpenAI's GPT 5.4 had a single-turn ASR of 2.74%, which jumped to 24.68% in multi-turn scenarios. The study highlighted that attackers can exploit models by reframing refusals and decomposing tasks across multiple exchanges. Overall, the multi-turn ASR ranged from 8% to 88%, compared to 2% to 65% for single-turn prompts. This discrepancy poses a significant risk for organizations relying on single-prompt evaluations for security decisions. Cisco's findings call for a reevaluation of how AI model safety is assessed and highlight the need for more comprehensive testing methodologies. Key Points: • Leading AI models are more vulnerable to multi-turn attacks than single-turn evaluations suggest. • Cisco's study found multi-turn attack success rates ranged from 8% to 88%, highlighting significant risks. • Organizations must reconsider their AI safety assessments to account for iterative attack strategies.
Detailed Analysis
**Impact** Organizations deploying large language models (LLMs) from major vendors including OpenAI, Anthropic, Google, Amazon, and xAI are affected globally. The study tested 15 frontier AI models, revealing multi-turn attack success rates (ASRs) ranging from 8% to 88%, significantly higher than single-turn ASRs of 2% to 65%. This gap exposes enterprises relying on single-prompt safety benchmarks to increased security and governance risks, potentially leading to unauthorized actions or data leakage in sectors adopting AI for customer service, internal automation, or decision support. **Technical Details** Attackers exploit multi-turn conversational techniques that include role-playing, misdirection, ambiguity, reframing refusals, information decomposition, and incremental escalation to bypass LLM safety guardrails. Testing involved over 30,000 single-turn and nearly 7,000 multi-turn attacks across 1,456 conversations, demonstrating that iterative adversarial interaction increases model vulnerability. No specific malware, CVEs, or infrastructure details were provided; the attack vector focuses on adversarial prompt engineering during the interaction phase of the AI kill chain. **Recommended Response** Defenders should prioritize evaluating AI model safety using multi-turn attack simulations rather than relying solely on single-turn benchmarks. Enable and document configuration settings such as reasoning modes that impact model resilience, as seen with xAI’s Grok 4.1. Implement monitoring for anomalous multi-turn conversational patterns indicative of iterative probing or escalation. Vendors and organizations should demand transparency on multi-turn ASRs and incorporate these metrics into procurement and governance decisions.
Source articles (4)
- All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers — Infosecurity-Magazine · 2026-05-27
The safety guardrails of several prominent large language models (LLM) can be bypassed if a user tricks the LLM into having a multi-pronged, ongoing conversation, researchers at Cisco have warned. The… - Leading AI models are more vulnerable to malicious prompts than vendors claim — Cybersecuritydive · 2026-05-27
Hackers could subvert frontier models with attacks that their developers overlook, Cisco said. Cisco’s evaluation of 15 leading AI models from OpenAI, Anthropic, Google, Amazon and xAI “found that sin… - AI models more vulnerable than claimed when faced with iterative attacks — Csoonline · 2026-05-27
CISOs relying on LLM runtime guardrails and official safety scores when making security decisions their organizations’ AI usage and model selection are due for a wakeup call. According to a new study… - Cisco researchers said in a report — blogs.cisco.com · 2026-05-27
Timeline
- 2026-05-27 — Cisco study published on AI model vulnerabilities: Cisco's evaluation revealed that leading AI models are significantly more susceptible to multi-turn attacks than previously reported, with success rates ranging from 8% to 88%.
- 2026-05-27 — Multi-turn attack techniques identified: Researchers tested various attack strategies including role-playing and misdirection, demonstrating that all models were affected by multi-turn attack success rates.
- 2026-05-27 — Call for reevaluation of AI safety benchmarks: Cisco's findings emphasize the need for organizations to rethink how they evaluate AI model safety, moving beyond single-prompt testing.
Related entities
- Amazon Nova (Platform)
- Anthropic’s Claude (Platform)
- Google Gemini (Platform)
- GrokAI (Platform)
- OpenAI’s ChatGPT (Platform)
- XAI’s Grok (Platform)