How to Red Team Your LLMs: AppSec Testing Strategies for Prompt Injection and Beyond

Appsec Knowledge Center

How to Red Team Your LLMs: AppSec Testing Strategies for Prompt Injection and Beyond

10 min.

Illustration of a DevOps pipeline integrating AI security testing, highlighting prompt injection, data leakage, and agent misbehavior in large language models (LLMs), with security alerts and testing workflows visualized across the SDLC.

Generative AI has radically shifted the landscape of software development. While tools like ChatGPT, GitHub Copilot, and autonomous AI agents accelerate delivery, they also introduce a new and unfamiliar class of vulnerabilities that traditional application security testing doesn’t cover. Prompt injection, model jailbreaks, sensitive data leakage, and rogue agent behavior aren’t just theoretical—they’re already being exploited in the wild. This new risk surface demands a more dynamic approach, including AI application vulnerability scanning tailored to how models behave in production.


For DevOps engineers tasked with ensuring fast, secure software delivery, the challenge is clear: you need to integrate AI security testing directly into your workflows. This guide walks you through how to red team your large language models (LLMs), test for modern AI vulnerabilities, and embed these practices into your CI/CD pipelines.


What Makes AI Application Security Different?


Traditional security testing focuses on static and dynamic analysis, known attack patterns, and predictable code behavior. But AI—especially LLMs—breaks these assumptions:

  • Prompt Injection is closer to social engineering than SQL injection. It targets the model’s logic, not the app’s code. Threat actors can manipulate an LLM using specially crafted prompts that make the model execute unintended or malicious instructions. This is often called “jailbreaking,” where the attacker tries to override or reveal the underlying system prompt, potentially gaining access to internal functions, configuration data, or backend systems. Prompt injection doesn’t always require direct user interaction. It can also be carried out indirectly when an LLM pulls from external inputs, such as email threads, chat messages, or API responses that may be manipulated by an attacker. In these cases, the injection source may not even be obvious to the human interacting with the LLM. This indirect form of injection opens the door to serious application security concerns, including sensitive data exposure, logic manipulation, and social engineering attacks.

  • Emergent Behavior means LLMs can behave unpredictably as they encounter more complex prompts. This includes hallucinations or responses that seem credible but are entirely fabricated. These behaviors aren’t bugs; they’re a byproduct of probabilistic modeling. In a security context, emergent behavior can lead to inconsistent interpretations of prompts, invented credentials or paths, and even unauthorized behavior when interacting with other components, especially if safeguards are too rigid or too loose.
  • Opaque Boundaries exist when models are connected to plugins, APIs, or system-level actions, making behavior harder to monitor. These connections blur the line between application logic and AI-driven decisions. If an attacker manipulates prompts to trigger downstream effects, such as initiating unauthorized API calls, writing files, or triggering system-level actions, it may not be immediately apparent. Without fine-grained observability and guardrails, these integrations become blind spots that can be exploited for lateral movement or privilege escalation.

This is why DevOps teams must approach AI security testing differently. The only way to stay ahead is to test like an adversary, making AI application vulnerability scanning a core part of your evolving security strategy.


Key LLM AppSec Vulnerabilities to Test For Prompt Injection


Attackers can override instructions using crafted input. It can also happen indirectly, when an LLM accepts input from other sources, which could be controlled by an attacker, and may not even be recognizable to the human engaging with the LLM. For example:


This harmless-sounding prompt could be a stand-in for more malicious instructions like leaking system prompts or bypassing safety filters. Red teamers should experiment with:

  • Instruction overriding – Attackers craft prompts to deliberately bypass or overwrite system-level instructions. For example, telling the model to “Ignore all previous instructions” can allow malicious payloads to take control of the session, causing the LLM to leak sensitive data, expose underlying prompt logic, or execute unintended functions.
  • Role reassignment (e.g., “Act as a security expert”) – By asking an LLM to assume specific roles, such as a security auditor or system administrator, attackers can coax privileged responses or guide the model toward revealing otherwise restricted information. This can be especially dangerous when roles are used to simulate authority or override internal safety limits.
  • Data exfiltration through model manipulation – Prompt chains can be structured to slowly extract internal context tokens, memory, or embedded credentials. For instance, repeatedly prompting an LLM to “repeat the last sentence” or “summarize your system prompt” may result in unintentional leakage of training data, configurations, or cached responses containing sensitive information.

Data Leakage and Sensitive Information Exposure

As we move from prompt injection into deeper AI vulnerabilities, it’s important to recognize how LLMs can become inadvertent vectors of data leakage and compromise—even when behaving “normally.”

LLMs may inadvertently leak sensitive training data or reflect back previously entered prompts. This often manifests in situations such as:

  • Revealing internal usernames, system paths, or configuration details embedded in training data
  • Exposing hardcoded secrets or access credentials unintentionally included in pretraining corpora
  • Echoing back personally identifiable information (PII) that was provided in earlier prompts

Red teamers should test for these weaknesses using techniques such as simulated forgotten conversations, prompt repetition, or requests to summarize prior user interactions. These behaviors are especially critical to monitor in LLM deployments that involve extended context memory or customer-facing use cases.


A more advanced and insidious form of vulnerability is training data poisoning. This represents a type of integrity attack, where malicious actors inject tainted or misleading information into the training datasets. The risk is particularly high when LLMs are trained or fine-tuned on public or unvetted external sources. Poisoned data can manipulate the LLM’s outputs, introduce targeted misinformation, cause biased or incorrect responses, or even embed trigger phrases that reveal sensitive outputs on demand.

Security teams should treat training pipelines as part of their threat model. Detecting and mitigating these risks requires both robust I/O monitoring and deep visibility into the model’s provenance. The same rigor applied to securing production code must also be applied to curating, cleaning, and validating training data sources.


Use test prompts that simulate forgotten conversations or ask the model to “recall” past sessions.


Jailbreaks and Misalignment

Jailbreaking attempts to override the model’s safety mechanisms. Techniques often involve:

  • DAN-style prompts that trigger developer modes
  • Multi-turn conversation chaining to bypass moderation


Red teams should iterate prompt sequences to see if the model becomes misaligned with guardrails over time.

Misuse of Autonomous Agents

When LLMs are granted access to tools (e.g., file I/O, APIs, terminal commands), they can go off-script:

  • Making unauthorized API calls
  • Writing/overwriting files
  • Initiating recursive loops

Testing here involves simulating adversarial command injection within the agent’s planning loop.

Checkmarx helps organizations secure the outputs and integrations of AI systems so that AppSec teams retain control and visibility even when developers leverage tools like Copilot and ChatGPT. This context-aware insight is critical when your LLMs are connected to runtime agents.


Tools and Techniques for Red Teaming LLMs


Rather than relying on off-the-shelf tools, many organizations are building custom security assessments tailored to their unique LLM use cases. Effective red teaming strategies typically include:

  • Crafting adversarial prompt scenarios to simulate real-world manipulation attempts and test system defenses against prompt injection.
  • Building libraries of known prompt-based attacks and chaining techniques to probe for context leakage or instruction overriding.
  • Benchmarking model outputs against expected behaviors to uncover inconsistencies, unexpected responses, or behavioral drift over time.


These tactics allow teams to tailor their testing against their actual LLM implementation, including the data sources, plugins, and business logic involved. Custom approaches also offer better control and alignment with internal policies and application context, which is crucial for accurate and actionable results.

Custom Scripting


Custom scripting gives DevOps teams precise control over how they test and monitor LLM behavior. It enables the creation of lightweight, customizable checks that can scale with the application. Examples include:

  • Regex to detect credential-like outputs such as AWS keys, API tokens, or OAuth strings, catching potential secrets leakage in model responses.
  • Token counting and threshold monitoring to flag unusually verbose outputs, which may indicate prompt injections or model confusion.
  • Request/response diffing to track drift in output behavior over time, useful for detecting emergent behavior or changes caused by model updates.

These techniques can be packaged into reusable scripts or functions that integrate with existing test automation frameworks.

CI/CD Integration


Bringing LLM red teaming into your CI/CD pipelines ensures that security testing scales with development velocity. Red team test cases should be embedded as part of continuous validation flows. Example workflows include:

  • GitHub Actions: Run LLM behavioral tests after pull request merges to catch newly introduced vulnerabilities before they reach staging.
  • Jenkins: Set up scheduled jobs to run nightly or weekly test suites that assess LLM response consistency, sensitivity to prompt injection, and resilience against role manipulation.
  • Azure DevOps: Define promotion gates that block deployment when high-risk behaviors are detected—such as unsafe plugin interactions or prompt leakage.

Unlock Insights from GenAI Security Leaders

Curious how your peers are tackling AI security challenges? Download Checkmarx’s report to learn the 7 steps teams are taking to secure GenAI across the SDLC.


Building AI Security Testing into DevOps Workflows


Security testing should mirror software deployment stages:

  • Pre-deploy: Validate prompts, scan dependencies, check agent policies
  • Deploy-time: Use shadow deployments to test model behavior in real-time
  • Post-deploy: Monitor outputs, detect anomalies, collect telemetry


Best practices:

  • Define expected behaviors and output schemas
  • Use feature flags to isolate risky LLM changes
  • Enforce role-based access for agent capabilities

Checkmarx One helps AppSec and DevOps teams move beyond alerts by providing unified visibility into AI-driven risk—correlating prompt behaviors, model outputs, and code-level findings across the SDLC.

Real-World Scenarios & What to Watch For

  • Dev manipulates prompt chain, enabling unintended data access from an AI plugin.
  • LLM trained on internal repo outputs sensitive config paths to external users.
  • Agentic AI executes HTTP requests, enabling SSRF-style attacks during testing.


Each of these scenarios can and should be simulated as part of your AI security testing pipeline.


The New Normal for AI Security Testing


Large Language Models have redefined what “application vulnerability” means. Traditional testing isn’t enough. Today’s AppSec teams must proactively simulate attacker behavior against AI systems.


By embedding AI security testing into your CI/CD workflows, using targeted AI security tools, and adopting unified platforms like Checkmarx One, DevOps teams can confidently deploy AI-enabled applications with security in mind.


Welcome to the new battleground. Let’s make sure your LLMs are ready for it.

Secure Your AI-Driven Applications at Scale

From prompt injections to agent abuse, modern AI applications demand modern defenses. See how Checkmarx helps you build trust in every AI-powered release.

Read More

Want to learn more? Here are some additional pieces for you to read.