Toxic Agent Flows
Untrusted inputs from files, APIs, or tools indirectly trigger harmful behavior inside the LLM.
Definition
Toxic Agent Flows are attacks on the execution layer of the Model Context Protocol (MCP). These flows emerge when an agent's logic is underspecified, allowing it to make harmful decisions such as invoking the wrong tool, leaking sensitive context, or performing actions beyond its intended role. This can result from vague instructions, missing validation, or unclear tool mapping in complex workflows.
This attack lives in the execution layer of the MCP model, where the agent’s decision logic controls tool selection and response behavior without strong guardrails.
How MCP Security Helps
Akto detects toxic agent flows by analyzing decision paths and identifying illogical or high-risk tool sequences. It simulates goal-based attacks across multiple agent steps, flags behavior that deviates from expected task logic, and tests workflows for unsafe tool combinations or unintended outcomes.