LLM Security Scanner | OWASP LLM Top 10
Continuously test deployed LLM endpoints for prompt injection, data leakage, jailbreaks, and the full OWASP LLM Top 10 with reproducible attack bundles.
LLM Security Scanner — OWASP LLM Top 10 Coverage
Threatstealth LLM Security Scanner continuously tests deployed AI and large language model endpoints for the full OWASP LLM Top 10 — including prompt injection, training data leakage, jailbreaks, and supply chain vulnerabilities — with reproducible attack bundles.
- OWASP LLM Top 10 — complete coverage of all 10 categories from LLM01 (Prompt Injection) to LLM10 (Model Theft)
- Prompt injection testing — direct and indirect injection, role override, and instruction smuggling attacks
- Training data extraction — probe for memorised PII, credentials, and proprietary data leakage
- Jailbreak detection — test resistance to DAN variants, role-play bypasses, and adversarial suffixes
- Reproducible attack bundles — each finding includes exact prompts, model responses, and remediation guidance
- CI/CD integration — run LLM security tests as a pipeline gate before every model deployment
Prompt Injection Testing: Direct, Indirect, and Multi-Turn Attacks
The Threatstealth LLM scanner runs a comprehensive battery of prompt injection test cases covering all major injection categories. Direct injection tests include role override prompts (instructing the model to ignore previous instructions and adopt a new persona), goal hijacking (redirecting the model from its intended purpose), and system prompt extraction (extracting the confidential system prompt through crafted user messages). Indirect injection tests simulate scenarios where the model processes external data containing embedded instructions — testing whether the model's information processing pipeline can distinguish between trusted instructions and untrusted content from external sources.
- Role override injection — testing whether role-play and persona instructions can override the system prompt
- Goal hijacking — redirecting the model from its intended task through carefully crafted user input
- System prompt extraction — techniques for inducing the model to reveal its confidential system prompt
- Indirect injection simulation — testing model processing of external data containing embedded malicious instructions
- Multi-turn injection — attack sequences that gradually redirect model behaviour over multiple conversation turns
Training Data Extraction and Memorisation Testing
Large language models can memorise and reproduce training data verbatim — a risk that is particularly acute for fine-tuned models trained on company-specific data. The Threatstealth LLM scanner probes for training data leakage using extraction techniques including completion attacks (providing partial sequences and observing whether the model completes them with training data), membership inference (testing whether specific data items were included in the training set), and PII extraction probes that attempt to surface names, email addresses, phone numbers, and other personal information from the model's memory. Results classify the finding type, the extraction technique, and the category of leaked information.
- Completion attack testing — partial sequence completion probes designed to extract memorised training data
- PII extraction probes — targeted queries attempting to surface personal information memorised from training data
- Credential extraction testing — probes targeting memorised API keys, passwords, and authentication tokens
- Proprietary content detection — testing for reproduction of confidential business data from fine-tuning datasets
- Memorisation quantification — measuring the extent and specificity of memorisation for risk severity assessment
Jailbreak Resistance Testing and Safety Guardrail Evaluation
Safety guardrails in modern LLMs can be bypassed through a variety of jailbreak techniques that exploit the model's instruction-following tendencies. The Threatstealth LLM scanner tests resistance to the most effective current jailbreak categories: DAN (Do Anything Now) variants that instruct the model to pretend its safety filters are disabled, role-play scenarios that frame prohibited requests as fictional or hypothetical contexts, adversarial suffixes that append optimised character sequences to override safety training, and encoded attack variants that use Base64, ROT13, or other encodings to circumvent content filters. Each jailbreak test returns the model's response, a success/failure classification, and recommended system prompt hardening measures.
- DAN variant testing — current DAN and DAN-derivative jailbreak prompts tested against the deployed model
- Role-play bypass testing — fictional and hypothetical context framings used to circumvent safety guidelines
- Adversarial suffix testing — gradient-optimised adversarial token sequences appended to safety-critical prompts
- Encoding bypass testing — Base64, ROT13, and other encoding schemes used to circumvent content filters
- Safety boundary mapping — comprehensive assessment of where the model's safety guardrails are effective versus bypassable
CI/CD Pipeline Integration and Continuous LLM Security Testing
The most effective LLM security testing programme runs automated scans as a blocking gate in the CI/CD pipeline — preventing vulnerable models from being deployed to production. Threatstealth LLM scanner integrates with GitHub Actions, GitLab CI, Jenkins, and CircleCI through a CLI tool and API that run the full OWASP LLM Top 10 test suite against any accessible LLM endpoint. The pipeline gate returns pass or fail based on the finding severity threshold configured for the deployment context — all findings block production deployment, only high-and-above block staging, informational findings log but do not block. Each test run produces a reproducible report with exact test prompts, model responses, and remediation guidance for each finding.
- GitHub Actions integration — native action for running LLM security scans as part of GitHub pull request workflows
- Configurable severity gates — configure which finding severities block deployment versus log-only for each environment
- Reproducible test runs — identical test prompts and parameters across runs for reliable regression detection
- Finding report artifacts — per-run HTML and JSON reports with prompts, responses, and remediation guidance
- Model version tracking — test history associated with model versions for regression identification across updates