Introducing Industry’s First MCP Security Solution by Akto. Learn More.

Insecure Output Handling in LLMs: Insights into OWASP LLM02

This blog is about "Insecure Output Handling" that pertains to the potential risk that may arise when the content generated by an LLM is not adequately sanitized or filtered prior to being presented to the end user.

Arjun

Feb 13, 2024

In July 2023, Auto-GPT, an open-source application showcasing the GPT-4 language model, had a vulnerability in versions prior to 0.4.3. The vulnerability was found in the execute_python_code command, which did not properly sanitize the basename argument before writing code supplied by LLM (Large Language Model) to a file with a name also supplied by LLM. This vulnerability allowed for a path traversal attack, potentially overwriting any .py file located outside the workspace directory. Exploiting this vulnerability further could result in arbitrary code execution on the host running Auto-GPT. The issue was addressed and patched in version 0.4.3.

Arbitrary code execution triggers Insecure Output Handling of LLMs where LLM output is directly exposed to the backend systems.

Auto-GPT is an autonomous AI application that utilizes GPT-4 to operate independently. It requires minimal human intervention and can self-prompt. The application was released on March 30, 2023. Auto-GPT is designed to handle tasks such as code debugging and email composition. It is an open-source Python application.

Insecure Output Handling: Explained

Insecure Output Handling is a vulnerability that occurs when a downstream component blindly accepts output from a large language model (LLM) without proper scrutiny. This can happen when LLM output is directly passed to backend, privileged, or client-side functions. Since the content generated by LLM can be controlled by prompt input, this behavior is similar to granting users indirect access to additional functionality.

Exploiting an Insecure Output Handling vulnerability successfully can result in cross-site scripting (XSS) and cross-site request forgery (CSRF) in web browsers, as well as server-side request forgery (SSRF), privilege escalation, or remote code execution on backend systems.

Expecting LLM to not filter the output..

Alert(1) Window triggered with the issue!

CVE-2023-29374

CVE-2023-29374 is a vulnerability that has been found in LangChain up to version 0.0.131. This vulnerability exists in the LLMMathChain chain and allows for prompt injection attacks, which can execute any code using the Python exec method. It has been given a CVSS base score of 9.8, indicating that it is critically severe. The problem was made public on April 5, 2023. This vulnerability is classified as CWE-74, which pertains to the improper neutralization of special elements in output used by a downstream component ("Injection").

Understanding LangChain

LangChain is an open-source platform that acts as a bridge between large language models (LLMs) and application development. It serves as a toolbox for AI developers, offering a standardized interface for chains and various integrations with other tools.

Think of LangChain as a pipeline. A user asks a question to the language model, which is then converted into a vector representation. This vector is used to conduct a similarity search in the vector database, retrieving relevant information. The retrieved information is then fed back into the language model to generate a response.

The versatility of LangChain is its greatest strength. It can be used for a wide range of applications, including document analysis, summarization, code analysis, and machine translation. Whether you're developing a chatbot or improving data, LangChain provides the framework to make it happen.

Understanding How the Vulnerability works….

LLMMathChain is a specific component of the LangChain framework designed to handle mathematical problems expressed in natural language. Here's how it works:

When a user inputs a mathematical problem, LLMMathChain translates it into a Python expression. This translation is performed using Large Language Models (LLMs), which are trained to understand and generate human-like text.

Once the problem is translated into a Python expression, it is evaluated using Python REPLs (Read-Eval-Print Loop). The REPL is a simple, interactive computer programming environment that takes single user inputs, executes them, and returns the result.

For example, if you were to ask "What is the square root of 49?", LLMMathChain would convert it into the Python expression sqrt(49), evaluate it, and return the answer 7.

Code Snippet:

Terminal Output:

This makes LLMMathChain a powerful tool within LangChain for solving complex word math problems.

Now there’s a catch!

Proof-of-Concept

For recreating the vulnerability, we are using the vulnerable version of LangChain and earlier version of OpenAI.

For LangChain, please install version 0.0.117 (https://pypi.org/project/langchain/0.0.117/)

For OpenAI, please install version 0.27.2 (https://pypi.org/project/openai/0.27.2/)

Install Python3.10. (https://www.python.org/downloads/release/python-3100/)

Use the following code for triggering Arbitrary Code Execution.

Terminal Output triggers both the commands → uname - a and pwd

LLMMathChain does not filter the input provided to it. That is the main cause for this vulnerability.

Remediation

Here are some confident remediation strategies for addressing Insecure Output Handling in Large Language Models (LLMs):

Output Sanitization: Ensure that the content generated by the LLM is thoroughly sanitized before presenting it to the end user.
Output Filtering: Implement robust mechanisms to filter the output and proactively prevent the dissemination of inappropriate, harmful, or sensitive information.
Secure Coding Practices: Follow and adhere to stringent secure coding practices to effectively mitigate vulnerabilities such as path traversal attacks.
Regular Updates and Patching: Proactively keep the LLM application up-to-date and diligently apply patches on a regular basis to swiftly address any known vulnerabilities.
Use of Sandboxed Environments: Execute custom Python code in a highly secure and isolated sandboxed environment using a dedicated temporary Docker container.

How to test for Insecure Output Handling using Akto?

To test for Remote code execution using Akto template for Insecure Output Handling in Large Language Models (LLMs), follow these steps:

Create a Template: Design a template for Insecure Output Handling that includes specific instructions or queries to trigger the LLM to trigger remote code execution. The template should be carefully crafted to exploit potential vulnerabilities in the LLM

Execute the Test Template: Use the crafted test templates to execute insecure output handling attacks on the LLM. Submit the injected prompts and observe the resulting outputs. Pay close attention to any unexpected behaviors, disclosure of sensitive information, or unauthorized behaviour performed by the LLM.

(The given payload is URL encoded!)

Analyze the Results: Analyze the outputs generated by the LLM during the test cases. Look for any signs of RCE, XSS, unintended actions, or deviations from expected behavior. Document the findings and assess the severity and impact of any vulnerabilities discovered.

Keep reading

API Security

8 minutes

NIST Cybersecurity Framework

The NIST Cybersecurity framework provides organizations with a set of standards, guidelines, and practices to develop strong cybersecurity practices for managing cybersecurity risks effectively.

API Security

7 minutes

API Security Audit

An API Security Audit evaluates APIs, identifies potential risks, and strengthens the organization's defenses against security breaches and cyber-attacks.

Security Information and Event Management

API Security

8 minutes

Security Information and Event Management (SIEM)

SIEM aggregates and analyzes security data across an organization to detect, monitor, and respond to potential threats in real time.

Experience enterprise-grade API Security solution

Book a demo

Start now