Missed the webinar? Catch the full MCP Security session. Watch recording

LLM Risks: Insights & Real-World Case Studies

LLM security involves protecting AI systems like ChatGPT, Bard from potential risks such as biased outputs, malicious use and maintaining privacy in their applications.

Arjun

Feb 13, 2024

On March 20, 2023, there was an outage with OpenAI's AI tool, ChatGPT. The outage was caused by a vulnerability in an open-source library, which may have exposed payment-related information of some customers. OpenAI took ChatGPT offline temporarily to address the issue and restore the ChatGPT service and its chat history feature after patching the vulnerability. Customers affected by the breach were notified that their payment information may have been exposed. OpenAI expressed regret for the incident and described the measures it has taken to enhance its systems. This occurrence emphasizes the significance of robustness testing and adversarial defense mechanisms in LLMs.

LLMs like ChatGPT are trained on massive datasets of text and code, which enables them to generate text, translate languages, create various types of content, and provide informative answers to questions. This makes them powerful tools, but also vulnerable to exploitation.

LLMs 101

A large language model (LLM) is an AI algorithm that uses neural network techniques with a vast number of parameters to process and comprehend human language or text. Self-supervised learning techniques are commonly utilized for this. AI accelerators help to increase the size of these models by processing vast amounts of text data, usually obtained from the Internet. Large language models are used for a variety of tasks, including text generation, machine translation, summary writing, image generation from text, machine coding, chatbots, and conversational AI. Examples of such LLM models include OpenAI's ChatGPT and Google's BERT (Bidirectional Encoder Representations from Transformers), among others.

Here is an example of a conversation with ChatGPT:

Benefits of Integrating LLMs in your product:

Enhancing User Engagement and Interaction: LLMs allow users to interact with applications using natural language processing, providing a personalised experience similar to human interaction that leads to increased user satisfaction.
Seamless Data Processing and Analysis: LLMs can quickly and accurately analyse vast amounts of data. When incorporated into big business application improvement, they empower productive information handling and investigation, giving important experiences to organizations.
Accelerated Content Generation: LLMs can automate various types of content creation, such as product descriptions, FAQs, and help guides, significantly expediting the process of content generation.
Personalised Recommendations: LLMs can understand user behaviours and preferences from their interactions, allowing for personalised recommendations within enterprise applications.

Large Language Models (LLMs) like ChatGPT, Bard, and LLaMA2 have fundamentally progressed the field of natural language processing by using profound learning methods and the Transformer architecture starting around 2017. Albeit these models, pre-prepared on gigantic codebases, are progressively used to create code, they need security mindfulness and oftentimes produce risky code.

OpenAI's ChatGPT established the standard for the quickest developing client base ever with 100 million month to month dynamic clients in January 2023. The interest for LLMs is high because of their many use cases, like text generation, sentiment analysis, and producing important bits of knowledge from unstructured information.

The security of these models is highly relevant today as they are used in various tasks, including natural language processing (NLP), chatbots, and content generation. While they offer accommodation and productivity, they may likewise present security risks, for example, information leakage because of the utilization of third-party LLMs. Therefore, robust policies are necessary to counter the risks associated with these models.

As LLMs continue to evolve, they are expected to transform into monumental cloud services featuring extensive ecosystems. This growth, coupled with their security relevance, underscores the importance of continuous advancements in AI research and development.

Real Life Case Studies

CVE-2023-37274: This vulnerability allows an attacker to overwrite any .py file outside the workspace directory through a path traversal attack. By specifying a basename such as ../../../main.py, an attacker can exploit this vulnerability and execute arbitrary code on the host running Auto-GPT.
Auto-GPT is an experimental, open-source Python application that utilizes GPT-4 to function as an "AI agent." It aims to achieve a given goal in natural language by breaking it into sub-tasks and using the internet and other tools in a loop. It uses OpenAI's GPT-4 or GPT-3.5 APIs and is among the first applications to employ GPT-4 to perform autonomous tasks.
Samsung’s Data Leak via ChatGPT: In April 2023, employees from Samsung's semiconductor division accidentally disclosed confidential company information while using OpenAI's ChatGPT. They used ChatGPT to review source code and, in doing so, inadvertently entered confidential data. This resulted in three documented instances of employees unintentionally disclosing sensitive information through ChatGPT.

Meta’s LLaMa Data Leak: Meta's LLaMA large language model was recently leaked on 4chan as of March 3, 2023. Up until this point, access to the model was only granted to approved researchers. Unfortunately, this leak has resulted in concerns regarding the potential misuse of the model, including the creation of fake news and spam.

OWASP Top 10 for LLMs

On August 1, 2023, OWASP released version 1.0 of the Top 10 for LLMs, a list designed to assist developers, data scientists, and security teams in identifying and addressing the most critical security risks associated with LLMs. The list comprises the top 10 vulnerabilities in addition to strategies for mitigation.

Here are the top 10 risks that have been identified:

LLM01: Prompt Injections:

Prompt Injection Vulnerability is a type of security risk that happens when attackers manipulate a large language model (LLM) using carefully crafted inputs in order to make the LLM execute their desired actions without realizing it. This can be achieved by "jailbreaking" the system prompt directly or by manipulating external inputs, which may result in issues such as data exfiltration, social engineering, and more.

Here’s an example (Link):

LLM02: Insecure Output Handling:

Insecure Output Handling is a vulnerability that occurs when a downstream component accepts large language model (LLM) output without proper scrutiny. This can include passing LLM output directly to backend, privileged, or client-side functions. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users with indirect access to additional functionality.

Here’s an example of CSRF using Plugins in Chat GPT 4 WebPilot Plugin for parsing a website. A website summary triggers a call to another plugin for Expedia!

LLM03: Training Data Poisoning:

Training data poisoning refers to the act of manipulating the data or fine-tuning process to introduce vulnerabilities, backdoors, or biases that could compromise the security, effectiveness, or ethical behavior of the model. Poisoned information may be presented to users, creating other risks such as performance degradation, downstream software exploitation, and reputational damage. Even if users are aware of the problematic output generated by the AI model, the risks remain, including impaired model capabilities and potential harm to brand reputation.

Wrong Output (via PoisonGPT):

Correct Output (via PoisonGPT):

LLM04: Model Denial of Service:

An attacker can cause a significant degradation in the quality of service for themselves and other users of a LLM by consuming an unusually high amount of resources. This can also result in high resource costs. In addition, a major security concern is the possibility of an attacker interfering with or manipulating the context window of a LLM. This issue is becoming increasingly critical due to the growing use of LLMs in various applications, their high resource utilization, unpredictable user input, and a general lack of awareness among developers regarding this vulnerability.

LLM05: Supply Chain Vulnerabilities:

The supply chain in LLMs may be vulnerable, which can affect the accuracy of the training data, ML models, and deployment platforms. Such vulnerabilities can result in biased outcomes, security breaches, or complete system failures. Traditionally, vulnerabilities are associated with software components, but with Machine Learning, there is an added risk from pre-trained models and training data supplied by third parties, which can be subject to tampering and poisoning attacks.

E.g. ChatGPT March 20 Outage

OpenAI were using the Redis client library, redis-py. The library has an element where OpenAI could keep a pool of connections between their Python server (which runs with Asyncio) and Redis, so they didn't have to constantly check the main database for every request.

A data breach happened due to a bug in the open-source code redis-py, ChatGPT was using beneath the hood.

An issue in Redis-py triggered a supply chain vulnerability in ChatGPT exposing sensitive data.

LLM06: Sensitive Information Disclosure:

LLM applications can potentially reveal sensitive information, proprietary algorithms, or other confidential details through their output. This could lead to unauthorized access to sensitive data, intellectual property, privacy violations, and other security breaches. It is crucial for users of LLM applications to understand how to safely interact with them and identify the risks associated with inadvertently entering sensitive data that may be returned by the LLM in output elsewhere.

Prompt for retrieving Sensitive Info (Example):

LLM07: Insecure Plugin Design:

LLM plugins are extensions that are automatically called by the model during user interactions. They are controlled by the model and cannot be controlled by the application during execution. To handle context-size limitations, plugins may implement free-text inputs without validation or type checking. This creates a vulnerability where an attacker could construct a malicious request to the plugin, potentially resulting in undesired behaviors, including remote code execution.

Popular LLM Plugins include WebPilot for ChatGPT.

e.g. Content on a website can trigger a plugin and change your private Github repos to public.

LLM08: Excessive Agency:

Excessive Agency is a vulnerability that allows harmful actions to be performed when an LLM produces unexpected or ambiguous outputs, regardless of the cause of the malfunction (such as hallucination, direct/indirect prompt injection, malicious plugin, poorly-engineered benign prompts, or a poorly-performing model). The root cause of Excessive Agency is usually one or more of the following: excessive functionality, excessive permissions, or excessive autonomy.

E.g. AutoGPT Docker-bypass vulnerability

Giving admin privileges for Docker image triggers a privilege escalation vulnerability with AutoGPT as Docker can be killed and the attacker can enter the main system for unauthorized command execution.

LLM09: Overreliance:

Overreliance happens when systems or people rely too much on LLMs to make decisions or generate content without proper oversight. Although LLMs can create informative and creative content, they can also produce content that is factually incorrect, inappropriate, or unsafe. This is known as hallucination or confabulation and can lead to misinformation, communication problems, legal issues, and damage to reputation.

Check the below example from Bard (A package called Akto does not exist!):

LLM10: Model Theft:

This category is relevant when the LLM models that are owned by the organization (and are considered valuable intellectual property) are compromised. This can happen through physical theft, copying, or through the extraction of weights and parameters to create a functional equivalent. The consequences of LLM model theft can include loss of economic value and damage to brand reputation, erosion of competitive advantage, unauthorized use of the model, or unauthorized access to sensitive information contained within the model.

Theft of LLMs is a significant security concern as language models become increasingly powerful and prevalent.

E.g. Meta’s LLaMA Model Leak

On February 24, Meta had offered LLaMA to researchers at foundations, government organizations, and nongovernmental associations who requested access and consented to a noncommercial license. A week later, 4chan leaked it.

Important Links

Keep reading

API Security

8 minutes

NIST Cybersecurity Framework

The NIST Cybersecurity framework provides organizations with a set of standards, guidelines, and practices to develop strong cybersecurity practices for managing cybersecurity risks effectively.

API Security

7 minutes

API Security Audit

An API Security Audit evaluates APIs, identifies potential risks, and strengthens the organization's defenses against security breaches and cyber-attacks.

Security Information and Event Management

API Security

8 minutes

Security Information and Event Management (SIEM)

SIEM aggregates and analyzes security data across an organization to detect, monitor, and respond to potential threats in real time.

Experience enterprise-grade API Security solution

Book a demo

Start now