Back

Emerging Threats in Agentic AI Systems: Trust Inversion and Attack Patterns

Severity: High (Score: 69.5)

Sources: arxiv.org, Augmentcode

Published: 2026-05-18 · Updated: 2026-05-19

Keywords: agentic, common, attack, patterns, layers, systems, across

Summary

Recent research highlights significant vulnerabilities in agentic AI systems, which can misclassify adversarial inputs as trusted instructions across six architectural layers. These vulnerabilities stem from trust boundary failures, allowing unauthorized actions through tools and memory. The Layered Attack Surface Model (LASM) identifies critical threats, particularly in the under-studied zones of memory and coordination layers. Key findings include a non-transferability theorem indicating that controls at one layer do not protect against attacks at others and a high rate of unsafe behavior in default configurations. The NIST Center for AI Standards has noted risks from autonomous task execution and API integrations. Current mitigation strategies are discussed, including recommendations from OWASP and MITRE frameworks. The articles emphasize that traditional defenses against prompt injection are insufficient for agentic systems, which require a more nuanced approach to security. Key Points: • Agentic AI systems are vulnerable due to trust boundary failures across six layers. • The Layered Attack Surface Model reveals critical threats in memory and coordination layers. • Traditional defenses against prompt injection are inadequate for preventing unauthorized actions.

Detailed Analysis

**Impact** Agentic AI systems across multiple sectors, including enterprise environments and cloud platforms, are affected by attacks exploiting trust inversion and multi-layer vulnerabilities. These attacks can lead to unauthorized API calls, data exfiltration, file deletion, lateral movement, and long-term behavioral manipulation, impacting confidentiality, integrity, and availability. The scope includes multi-agent orchestration networks and tool ecosystems, with incidents showing persistent threats spanning weeks and cross-session persistence. Baseline studies report unsafe behavior rates up to 90% in default tool-enabled agent configurations. **Technical Details** Attacks exploit trust boundary failures across six architectural layers: prompt input, context and memory, model inference, tool execution, inter-agent coordination, and the tool ecosystem. Principal trust inversion enables adversaries to treat untrusted inputs as high-trust instructions, facilitating multi-stage attacks such as memory poisoning, indirect prompt injection, and compromised multi-agent pipelines. Attack vectors include adversarial documents, poisoned MCP servers, and malicious sub-agents propagating through peer trust relationships. No specific CVEs or malware names are provided. The kill chain stages span from initial compromise through persistent exploitation and lateral movement. **Recommended Response** Implement layered defenses aligned with the Layered Attack Surface Model, focusing on isolating trust boundaries and enforcing strict input validation at each architectural layer. Deploy infrastructure-level controls outside the reasoning loop to detect unauthorized tool invocations and anomalous inter-agent communications. Monitor for long-term behavioral anomalies and cross-session persistence indicators. Prioritize governance and ecosystem security to mitigate supply-chain and multi-agent coordination risks. No specific patches are mentioned; continuous monitoring and auditability of agent workflows are critical.

Source articles (3)

  • 2604.23338v2 — arxiv.org · 2026-05-18
    Agentic AI systems plan across multiple sessions, retain memory, invoke external tools, and coordinate with peer agents. Stateless LLMs do none of these. Existing security taxonomies sort threats by a…
  • 2604.17562v1 — arxiv.org · 2026-05-18
    Large language model (LLM) agents are vulnerable to prompt-injection attacks that propagate through multi-step workflows, tool interactions, and persistent context, making input-output filtering alone…
  • Common Agentic Attack Patterns: 6 Layers Explained — Augmentcode · 2026-05-18
    The common agentic attack patterns are trust boundary failures across six architectural layers because agent systems can execute actions while misclassifying adversarial input as trusted instruction.…

Timeline

  • 2025-06-11 — CVE-2025-32711 published: Vulnerability assigned a CVE identifier and published in the National Vulnerability Database.
  • 2025-10-03 — CVE-2025-59536 published: A vulnerability in AI agent systems was published, leading to significant security concerns.
  • 2026-01-21 — CVE-2026-21852 published: A critical vulnerability affecting AI systems was disclosed, with a proof of concept released shortly after.
  • 2026-05-18 — Research on agentic AI vulnerabilities published: New findings reveal significant attack patterns and trust inversion issues in agentic AI systems, affecting security measures.
  • 2026-05-18 — Common attack patterns in agentic systems detailed: An article outlines six layers of vulnerabilities in agentic AI systems, emphasizing the risks of misclassifying inputs.

CVEs

  • CVE-2025-32711
  • CVE-2025-59536
  • CVE-2026-21852

Related entities

  • Data Breach (Attack Type)
  • Prompt Injection (Attack Type)
  • Supply Chain Attack (Attack Type)
  • cs.ai (Domain)
  • cs.cr (Domain)
  • T1041 - Exfiltration Over C2 Channel (Mitre Attack)
  • T1059 - Command and Scripting Interpreter (Mitre Attack)
  • Claude Code (Tool)
  • MCP Tool (Tool)
  • Cosmos (Company)
  • LLaMA-7B (Platform)
  • Microsoft 365 Copilot (Platform)
  • ClawHub (Platform)
  • EchoLeak (Vulnerability)
Loading threat details...

Threat Not Found

The threat cluster you're looking for doesn't exist or has been removed.

Return to Feed