Indirect Prompt Injection
Untrusted content from tools, files, or APIs is interpreted as instructions, leading to unwanted model behavior.
Definition
Indirect Prompt Injection is an attack on the input layer of the Model Context Protocol (MCP). It happens when attacker-controlled content—such as tool outputs, API responses, or uploaded files—is incorporated into the agent’s prompt and misinterpreted as a new instruction. Since this content is not explicitly submitted as a prompt, it often escapes filtering, giving attackers a stealthy way to manipulate model behavior or force unintended tool execution.
This attack lives in the input layer of the MCP model, where untrusted context enters the prompt pipeline without clear boundaries.
How MCP Security Helps
Akto protects against indirect prompt injection by scanning all external inputs for embedded instructions or prompt-like structures. It validates how agent context is built, tests content boundaries using crafted payloads, and flags injection risks from tools, files, or third-party data sources.