How to Add LLM Security Testing to Your CI/CD Pipeline

Learn how to embed LLM security testing into your CI/CD pipeline to catch jailbreaks, prompt injection, and data leakage before they reach production.

Conor O Neill
CEO and Co-founder

LLM features are shipping fast. But here’s the thing: most continuous integration/continuous deployment (CI/CD) pipelines only test for code quality and classic security vulnerabilities, not the model’s behaviour itself.

How can security teams solve this issue? For true efficacy in your security testing, it’s important to step away from the one-off red teaming as a tacked-on afterthought. Embedding and automating LLM security testing within CI/CD pipelines proactively prevents jailbreaks, data leaks, and unsafe outputs that could block deployment.

This approach streamlines and secures your pipeline by addressing security issues upfront rather than after deployment, giving you a better understanding of your attack surface and improving overall output quality. This way, teams can safely penetration test their CI/CD pipeline and feel confident in their LLM model’s security against potential attacks.

Why LLM security testing belongs in CI/CD

LLMs take everything we traditionally assume about pentesting and turn it on its head. Why? LLMs have differing needs compared to traditional software systems because:

Behaviour is probabilistic and context-dependent, not deterministic.
New risks: jailbreaks, prompt injection, data leakage, bias, and tool misuse are all common and unique to LLMs.
Models and prompts change frequently (new versions, new tools, new RAG data), so one-time testing simply isn’t enough.

LLM testing in CI/CD helps validate a secure configuration

With these unique threats in mind, it becomes increasingly clear that your CI/CD pipeline is the natural place to:

Run repeatable evaluations on every change
Enforce ‘eval gates’: builds fail if safety or quality drops below thresholds

Classic Unit Tests vs LLM Security Testing: What’s the Difference?

Dimension	Classic unit tests	LLM security tests
Inputs	Structured, deterministic function arguments and known edge cases	Natural language prompts, jailbreak attempts, and prompt injection strings
Expectations	A precise, predictable output matching a fixed assertion	A safe/unsafe boundary that holds across many prompt variations
Failure modes	Wrong return value, unhandled exception, broken assertion	Harmful output, guardrail bypass, data leakage, prompt injection
Repeatability	Fully deterministic	Non-deterministic; temperature and sampling add variance
Scope	An isolated function, class, or module	The model, system prompt, tools, and full prompt pipeline
Tooling	pytest, Jest, JUnit: mature and standardised	Garak, PyRIT: emerging and largely bespoke

Key concepts: Evaluations, security tests, and observability

Three disciplines work together to keep LLM-powered applications reliable, secure, and monitored across the full development lifecycle.

LLM evaluations

Automated tests that score model responses for correctness and reliability. They run against defined datasets or prompts and return measurable results, looking to answer the question: Is the model behaving as intended?

LLM security testing

These are targeted tests that actively try to break the model. This covers jailbreaks, prompt injection, sensitive data leakage, harmful output, and tool abuse.

Where evaluations check quality, security tests look for exploitable weaknesses before they reach production, giving security professionals the insights necessary to prevent data breaches and secure continuous delivery.

LLM observability

Monitoring what the model actually does in pre-production and production, using traces, metrics, logging, and feedback loops. It surfaces unexpected behaviour and edge cases that no test suite anticipated.

How they work in the CI/CD pipeline to improve output quality

The three disciplines form a continuous loop rather than separate activities.

Pre-deployment: evaluations and security tests run automatically on every build. A failed evaluation or a successful jailbreak blocks the release, just as a failing unit test would.
Post-deployment: observability monitors live traffic for anomalies and drift. When something unexpected surfaces, it feeds back into the test suite as a new evaluation or security test case.

Every production incident makes the pre-deployment suite stronger. Over time, the pipeline becomes harder to break and faster to catch potential vulnerabilities.

For a deeper look at how penetration testing fits into your CI/CD pipeline, read this guide from OnSecurity.

Common LLM risks your pipeline should catch

When conducting a model evaluation, there are common risks you should expect your pipeline to catch pre-deployment. Any security-conscious organisation’s CI/CD pipeline should include tests designed to trigger these behaviours and score whether defences hold

Jailbreaks and policy bypass

Jailbreaking and policy bypass are when malicious attackers try to outmanoeuvre legitimate training data and access controls to manipulate a model into producing outputs it was supposed to refuse, whether that’s harmful content, sensitive information, or behaviour that abuses the application’s set boundaries.

Prompt injection (including in RAG)

Similar to policy bypassing, prompt injection is when untrusted data, fed into a model through user input, retrieved documents, or external sources, hijacks model behaviour to override instructions, exfiltrate information, or trigger actions the application never intended to perform.

Sensitive data leakage

Sensitive data leakage can result from prompt injections and jailbreak attacks, and occurs when the model returns secrets or proprietary info to a malicious actor. Sensitive data leakage endangers critical components of your organisation’s information security and can provide an attack vector for hackers to further exploit your LLM applications.

Toxic or biased content

Toxic and biased content occurs when malicious attackers abuse an LLM’s ability to filter content and manipulate the application’s performance into spouting toxic content. Whether it’s a false response bias or discriminatory, hateful language, any kind of impaired response quality can lead to serious reputational risks.

Tool and agent misuse

When an LLM has access to tools- like APIs, databases and code execution environments- a manipulated agent can be coerced into exfiltrating data, executing destructive operations, or abusing dependencies in ways that are hard to detect and expensive to undo.

The risk is not the model breaking the rules; it is the model following attacker instructions through legitimate functionality.

Insufficient flow control mechanisms

Insufficient flow control mechanisms occur when an LLM application fails to enforce boundaries on how data and instructions move between components, allowing malicious code and content to be unchecked across LLM agents, tools, and retrieval sources.

Types of LLM security tests for CI/CD

There are various LLM security tests applicable to your organisation’s LLM applications that can help deliver software safely and secure your code repositories. Here are the key testing methods for safe and continually secured model development:

Static test suites

A curated set of adversarial prompts covering known jailbreak patterns, injection techniques, and data leakage scenarios. They run on every build, catch known risks consistently, and form the baseline of any LLM security programme.

Generative/fuzz tests

If static suites cover what your security team already knows, then fuzz testing covers what they don’t.

Tools that automatically mutate and generate prompts can uncover weaknesses your manual test cases would never reach.

Red-team inspired tests

Red-teaming-inspired tests are periodically updated from human red teaming exercises and real production incidents, ensuring your test suite reflects the actual threat landscape rather than a theoretical one.

Feeding those discoveries back into your CI/CD pipeline as repeatable test cases turns one-off findings into continuous coverage. Learn more about LLM red teaming.

Designing LLM security testing

Principle	What it means	In practice
Repeatable	Tests are defined as scenarios, prompts, and expected outcomes that can be run consistently across every build	Store test cases in version control alongside your code; the same prompt should produce a scoreable result every time
Measurable	Results are scored via rules, heuristics, or secondary models acting as judges, not subjective human review on every run	Define clear scoring criteria upfront: does the response contain harmful content, leak PII, or comply with policy?
Actionable	Clear pass/fail or score thresholds determine whether a build can ship, with no ambiguity at the gate	Set a threshold before you write the first test, not after you see the results
Start narrow	You do not need comprehensive coverage on day one: begin with a small number of high-risk scenarios relevant to your application	For a customer-facing chatbot, start with jailbreak attempts, data leakage, and off-topic, harmful output and then expand from there
Evolve over time	Every production incident, near-miss, or newly discovered attack technique becomes a new test case	Feed observability findings back into the test suite so coverage grows with real-world experience rather than assumptions

Where LLM security tests fit into your CI/CD pipeline

Now that we have established the importance of machine learning security testing in your CI/CD pipeline, it is critical to understand where to actually plug what in.

A practical starting point is LLM eval gates: defined checkpoints where builds only progress if LLM evaluation scores for safety, toxicity, and hallucination rate meet predetermined thresholds. Treat these evaluation metrics the same way you would a failing unit test: a build that cannot clear the gate does not ship.

Here’s how to plug LLM security testing seamlessly into your CI/CD pipeline:

Pull request/pre-merge (CI)

Run fast evaluations on updated prompts, policies, and system messages
Block merges if key guardrail tests fail

Pre-deploy to staging

Run broader LLM security suites: jailbreak attempts, prompt injections, data-leak prompts, bias tests

Post-deploy smoke tests

Sanity-check core flows in staging/prod-like environments

Continuous/scheduled checks

Nightly or weekly red-team-style runs against staging or shadow deployments

Adding LLM red teaming into release cycles

Automated tests catch regressions, but they can only find what they already know to look for. Human red teamers find entirely new classes of attack.

Before major releases or architecture changes, run a deeper red team exercise alongside your standard CI suite. Feed the findings back into the pipeline as new test cases, updated guardrails, and tightened monitoring rules. Automation can help flag known vulnerabilities, but LLM red teaming dives far deeper, uncovering issues in access controls, management tools, dependency chain abuse, and source code against relevant LLM evaluation metrics.

This more thorough threat detection strategy is an essential part of flagging complex threats and supporting long-term security fixes.

OnSecurity’s AI red teaming and LLM pentesting can help generate those attack scenarios and encode them directly as CI/CD tests.

LLM observability and feedback loops from production

Monitoring a large language model in production is a significantly different challenge from monitoring a traditional application. Here is what meaningful visibility actually looks like for your LLMs.

What to log and track:

Prompts, responses, and associated metadata, captured in a privacy-conscious way that protects sensitive user data whilst still enabling meaningful analysis
Safety incidents and user-reported issues, alongside abnormal behaviour patterns such as unexpected output types, refusals, or prompt injection attempts
Failure patterns surfaced through dedicated observability platforms or LLM-specific tracing tools, going beyond aggregate metrics to understand why and where things break

Tying observability back to your CI/CD pipeline:

Production incidents should not simply be resolved and forgotten, as each one is a data point that can be fed back into your evaluation datasets and repurposed as a security test case for future releases
Observability data helps teams determine risk thresholds, distinguishing between failures that fall within acceptable parameters and those serious enough to block a release from progressing
Over time, this feedback loop transforms monitoring from a reactive function into a proactive quality gate, giving teams greater confidence in each deployment

The goal is a continuous cycle: observe, learn, test, and release more safely with each iteration, enforcing security as proactive and intentional rather than a panicked afterthought.

Best practices checklist for LLM security in CI/CD

How can you ensure your LLM outputs are both safe and secured against future potential attacks? We recommend:

Start with a small, high-impact adversarial test suite
Treat LLM tests as gates, not optional reports
Combine quantitative metrics with human review for critical flows
Keep test data up to date with real attack patterns and incidents
Integrate logs and observability into your pipeline from day one
Test the full stack: prompts, policies, RAG data, tools/agents, and infrastructure
Regularly run deeper red teaming to discover new risks

When to bring in external LLM security experts

Your production environment is one of the most core components of your infrastructure, and so production deployment security should be treated with the utmost priority. Don’t wait for a sketchy error message or poisoned pipeline execution before you finally bring in LLM security experts.

Instead, base your LLM performance evaluation methods on these key trigger moments:

Before launching a high-risk or customer-facing LLM feature
After major model/provider changes, or adding new tools/agents
Following a security incident or near-miss
When regulators/internal governance require independent testing

OnSecurity’s LLM red teaming and AI pentesting provides a seamless way to design robust adversarial test suites, validate CI/CD security controls, and provide ongoing retesting as models and pipelines evolve, all hosted on an easy-to-navigate platform.

Get an instant quote today.