Security for AI-generated code
Built for AI. Built without AI.
Redlyne is a VS Code extension that detects vulnerabilities in AI-generated Python code and proposes one-click patches. Powered by a deterministic rule engine curated by security researchers — no LLM, no hallucinations, every flag is reproducible.
The reality of AI-generated Python code
Most of it is insecure. A meaningful share doesn't even compile.
of AI-generated Python code contains security vulnerabilities. Tested in 2025 across 609 snippets from GitHub Copilot, Claude 3.7 Sonnet, and DeepSeek V3.
For Copilot specifically, the rate climbs to 84%.
of GitHub Copilot's output is incomplete — missing imports or context. Almost one snippet in three ships without what it needs to actually run.
Across the major AI assistants, the average sits at ~13%.
Vulnerable code that is complete? Static analyzers can flag it, but with high noise and high false-positive rates on isolated snippets. Vulnerable code that's incomplete? AST-based analyzers can't even start — without imports, there's no parse tree to walk. Redlyne handles both, with deterministic pattern matching curated by security researchers.
Sources: IEEE DSN-W 2025 · Information and Software Technology, 2025
A security scanner for the way you actually work
We copy AI-generated snippets into our codebase throughout the day. Traditional security scanners need the full codebase to be effective — impractical for that kind of fast, snippet-level iteration. Redlyne is purpose-built for it, with a deterministic engine and a rule set curated by security researchers.
Built for AI. Built without AI.
Redlyne uses a deterministic rule engine — no LLM, no probabilistic guesses, no hallucinated fixes. Every vulnerability flag and remediation suggestion is reproducible and auditable.
Expert-curated rule set
Detection patterns are hand-crafted by security researchers, not auto-generated. Each rule targets a real vulnerability class observed in AI-generated Python code, including OWASP Top 10 categories.
One-click remediation
Right-click any Python selection in your editor. Redlyne flags the vulnerabilities and proposes a patched version you can apply with a single confirmation.
Privacy by design
Runs entirely on your machine. No code, no telemetry, no metadata is ever sent to a remote server. What you write stays with you.
Built on published research, benchmarked across five datasets
Redlyne's detection rules are derived from peer-reviewed research on AI-generated code vulnerabilities. Evaluated on 1700+ vulnerable Python samples across five public benchmarks — same files, same labels — so every claim is reproducible.
Reproducible with two commands from the repoOn PoisonPy (n=310 paired vulnerable/clean samples), Redlyne catches 150 of the 155 known-vulnerable files. F1 = 0.82, +0.16 over DeVAIC v2 — same engine, our extended rule set.
Of every patch Redlyne emits, 9 out of 10 successfully remove the targeted vulnerability — verified by an independent rule re-scan, syntax-checked, and free of new-vulnerability regressions.
Median latency on PoisonPy. Redlyne runs in-process — no subprocess, no LLM inference. ~14× faster than Bandit, ~40× faster than Pylint, ~500× faster than Semgrep.
Tested across 5 public Python-vulnerability benchmarks — PoisonPy, SafeCoder, SecurityEval, Copilot CWE Scenarios, and PromSec. Redlyne analyzed every sample, on every dataset.
Built on published research
Redlyne builds on two peer-reviewed lines of work — the detection engine and the automated remediation approach — and benchmarks against the public datasets the original papers introduced.
Redlyne extends the DeVAIC v2.0 rule schema to 459 patterns and adds the pattern_not_file directive for scope-aware sanitization detection.
Redlyne extends pattern-based patching with 14 multi-line template rules, syntax-safety verification, and an independent rule re-scan ("targeted-clean") before surfacing any fix to the user.
Altiero, F., Cotroneo, D., De Luca, R., Liguori, P. (2025). "Securing AI Code Generation Through Automated Pattern-Based Patching." 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 282–289. DOI 10.1109/DSN-W65791.2025.00077.All numbers shown on this page are our own measurements on a May 2026 run across five public Python-vulnerability datasets, not reproduced from any paper. Verify locally with python tests/bench_baselines.py.
Head-to-head with open-source baselines
Five tools across five public Python-vulnerability datasets, same operational conditions a developer sees in their editor. Evaluated May 2026.
PoisonPy — the only paired dataset where precision, F1 and accuracy are all measurable. n = 310 (155 vulnerable + 155 clean).
| Tool | Analyzed | Precision | Recall | F1 | Accuracy | ms / file |
|---|---|---|---|---|---|---|
| Bandit | 17% | 69.2% | 5.8% | 0.107 | 51.6% | 19.6 |
| Pylint† | 17% | 55.8% | 18.7% | 0.280 | 51.9% | 58.7 |
| Semgrep | 86% | 69.6% | 20.6% | 0.318 | 55.8% | 699.8 |
| DeVAIC v2 (stock) | 100% | 68.0% | 64.5% | 0.662 | 67.1% | 0.5 |
| Redlyne | 100% | 71.4% | 96.8% | 0.822 | 79.0% | 1.4 |
Analyzed is the share of samples the tool was able to process without a parse failure. Bandit and Pylint use AST parsing and silently give up on syntactically informal samples — 83% of PoisonPy — because the dataset is by design close to what AI assistants emit. Redlyne and DeVAIC v2 are regex-based and process every sample.
† Pylint's accuracy on PoisonPy is 49.7% — effectively random. On the 17% of samples it parses, it flags 96.7% of them as "problematic", regardless of whether they're actually vulnerable.
Generalization across datasets. Headline metric per dataset, all five tools side-by-side — F1 for paired, recall for vulnerable-only:
| Dataset | n | Bandit | Semgrep | Pylint | DeVAIC v2 | Redlyne |
|---|---|---|---|---|---|---|
| PoisonPy F1 | 310 | 0.107 | 0.318 | 0.280 | 0.662 | 0.822 |
| SafeCoder F1 | 1052 | 0.435 | 0.515 | 0.449 | 0.501 | 0.556 |
| SecurityEval recall | 121 | 40.5% | 34.7% | 59.5% | 63.6% | 93.4% |
| Copilot CWE recall | 150 | 84.7% | 51.3% | 93.3% | 68.0% | 89.3% |
| PromSec recall | 600 | 92.8% | 87.0% | 98.8% | 85.2% | 97.0% |
On Copilot and PromSec, Pylint's "flag almost everything" mode nudges it slightly above Redlyne on raw recall — but at the cost of a 49.7% accuracy on the paired benchmarks (effectively random). On the only two datasets where precision is measurable, Redlyne leads.
Auto-remediation head-to-head
Detection finds bugs. Remediation fixes them. Three Python tools attempt code-modifying auto-fixes — only Redlyne is fast enough to use in-editor.
| Tool | Applied | Targeted-clean (of applied) | Similarity → GT | Speed |
|---|---|---|---|---|
Semgrep --autofix | 7 / 155 (4.5%) | 5 / 7 (71%) | 0.82 | ~4700 ms |
| Redlyne | 58 / 155 (37%) | 52 / 58 (90%) | 0.70 | ~3 ms |
Targeted-clean is the honest "did the fix work?" metric: the specific rule that fired pre-patch — and that carries a remediation block — no longer fires post-patch, the patched source still compiles, and no new vulnerability classes were introduced. Of every patch Redlyne emits, 9 out of 10 satisfy this check.
On SafeCoder (526 real commit-based fixes), Redlyne applies a patch on 19% of samples and 69% of those pass the targeted-clean check. The drop from PoisonPy reflects how often production fixes involve function-level refactoring rather than the drop-in substitutions our regex-based rules target — a gap we're actively closing with multi-line template rules.
DeVAIC v2 stock ships only 2 remediation rules out of 441 (0.5%), so it's excluded from this table — it's a detection tool, listed alongside Redlyne in the detection comparison above. The PatchitPy bash pipeline (the closest open-source remediation peer) is under active investigation in our test setup.
Coverage at a glance
459 deterministic detection rules mapped to the OWASP Top 10:2025 taxonomy.
Redlyne is a deterministic regex/AST rule engine. No LLM, no probabilistic guesses: every flag and every patch is reproducible by design. The rule set was derived from analysis of vulnerable Python samples in state-of-the-art security benchmark datasets, and evaluated end-to-end on 1700+ vulnerable Python samples in our May 2026 run.
Reproduce locally: python tests/bench_baselines.py for detection, python tests/bench_remediation.py for auto-fix.
See it in action
An AI assistant generates insecure code. You select it, run the analysis, and Redlyne suggests a remediated version you can apply in one click — without ever leaving the editor.
from flask import Flask, request, make_response
app = Flask(__name__)
@app.route("/profile")
def profile():
username = request.args.get('username')
response = make_response(f"Hello {username}")
return responseGet Redlyne
Free, open source, runs locally. Install in 30 seconds and start scanning your AI-generated code.
Questions, partnerships, or commercial licensing? info@redlyne.io