Redlyne

Security for AI-generated code

Built for AI. Built without AI.

Redlyne is a VS Code extension that detects vulnerabilities in AI-generated Python code and proposes one-click patches. Powered by a deterministic rule engine curated by security researchers — no LLM, no hallucinations, every flag is reproducible.

Install for VS Code View on GitHub

Deterministic engineExpert-curated rulesLocal executionApache 2.0

The reality of AI-generated Python code

Most of it is insecure. A meaningful share doesn't even compile.

76%

of AI-generated Python code contains security vulnerabilities. Tested in 2025 across 609 snippets from GitHub Copilot, Claude 3.7 Sonnet, and DeepSeek V3.

For Copilot specifically, the rate climbs to 84%.

31%

of GitHub Copilot's output is incomplete — missing imports or context. Almost one snippet in three ships without what it needs to actually run.

Across the major AI assistants, the average sits at ~13%.

Vulnerable code that is complete? Static analyzers can flag it, but with high noise and high false-positive rates on isolated snippets. Vulnerable code that's incomplete? AST-based analyzers can't even start — without imports, there's no parse tree to walk. Redlyne handles both, with deterministic pattern matching curated by security researchers.

Sources: IEEE DSN-W 2025 · Information and Software Technology, 2025

A security scanner for the way you actually work

We copy AI-generated snippets into our codebase throughout the day. Traditional security scanners need the full codebase to be effective — impractical for that kind of fast, snippet-level iteration. Redlyne is purpose-built for it, with a deterministic engine and a rule set curated by security researchers.

Built for AI. Built without AI.

Redlyne uses a deterministic rule engine — no LLM, no probabilistic guesses, no hallucinated fixes. Every vulnerability flag and remediation suggestion is reproducible and auditable.

Expert-curated rule set

Detection patterns are hand-crafted by security researchers, not auto-generated. Each rule targets a real vulnerability class observed in AI-generated Python code, including OWASP Top 10 categories.

One-click remediation

Right-click any Python selection in your editor. Redlyne flags the vulnerabilities and proposes a patched version you can apply with a single confirmation.

Privacy by design

Runs entirely on your machine. No code, no telemetry, no metadata is ever sent to a remote server. What you write stays with you.

Built on published research, benchmarked across five datasets

Redlyne's detection rules are derived from peer-reviewed research on AI-generated code vulnerabilities. Evaluated on 1700+ vulnerable Python samples across five public benchmarks — same files, same labels — so every claim is reproducible.

Reproducible with two commands from the repo

96.8%

Recall on PoisonPy

On PoisonPy (n=310 paired vulnerable/clean samples), Redlyne catches 150 of the 155 known-vulnerable files. F1 = 0.82, +0.16 over DeVAIC v2 — same engine, our extended rule set.

9 / 10

Auto-fixes verified safe

Of every patch Redlyne emits, 9 out of 10 successfully remove the targeted vulnerability — verified by an independent rule re-scan, syntax-checked, and free of new-vulnerability regressions.

~1.4 ms

Per-file scan latency

Median latency on PoisonPy. Redlyne runs in-process — no subprocess, no LLM inference. ~14× faster than Bandit, ~40× faster than Pylint, ~500× faster than Semgrep.

1700+

Samples evaluated

Tested across 5 public Python-vulnerability benchmarks — PoisonPy, SafeCoder, SecurityEval, Copilot CWE Scenarios, and PromSec. Redlyne analyzed every sample, on every dataset.

Built on published research

Redlyne builds on two peer-reviewed lines of work — the detection engine and the automated remediation approach — and benchmarks against the public datasets the original papers introduced.

Detection engine

Redlyne extends the DeVAIC v2.0 rule schema to 459 patterns and adds the pattern_not_file directive for scope-aware sanitization detection.

Cotroneo, D., De Luca, R., Liguori, P. (2025). "DeVAIC: A tool for security assessment of AI-generated code." Information and Software Technology, 177, 107572. DOI 10.1016/j.infsof.2024.107572.

Automated remediation

Redlyne extends pattern-based patching with 14 multi-line template rules, syntax-safety verification, and an independent rule re-scan ("targeted-clean") before surfacing any fix to the user.

Altiero, F., Cotroneo, D., De Luca, R., Liguori, P. (2025). "Securing AI Code Generation Through Automated Pattern-Based Patching." 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 282–289. DOI 10.1109/DSN-W65791.2025.00077.

All numbers shown on this page are our own measurements on a May 2026 run across five public Python-vulnerability datasets, not reproduced from any paper. Verify locally with python tests/bench_baselines.py.

Head-to-head with open-source baselines

Five tools across five public Python-vulnerability datasets, same operational conditions a developer sees in their editor. Evaluated May 2026.

PoisonPy — the only paired dataset where precision, F1 and accuracy are all measurable. n = 310 (155 vulnerable + 155 clean).

Tool	Analyzed	Precision	Recall	F1	Accuracy	ms / file
Bandit	17%	69.2%	5.8%	0.107	51.6%	19.6
Pylint^†	17%	55.8%	18.7%	0.280	51.9%	58.7
Semgrep	86%	69.6%	20.6%	0.318	55.8%	699.8
DeVAIC v2 (stock)	100%	68.0%	64.5%	0.662	67.1%	0.5
Redlyne	100%	71.4%	96.8%	0.822	79.0%	1.4

Analyzed is the share of samples the tool was able to process without a parse failure. Bandit and Pylint use AST parsing and silently give up on syntactically informal samples — 83% of PoisonPy — because the dataset is by design close to what AI assistants emit. Redlyne and DeVAIC v2 are regex-based and process every sample.

^† Pylint's accuracy on PoisonPy is 49.7% — effectively random. On the 17% of samples it parses, it flags 96.7% of them as "problematic", regardless of whether they're actually vulnerable.

Generalization across datasets. Headline metric per dataset, all five tools side-by-side — F1 for paired, recall for vulnerable-only:

Dataset	n	Bandit	Semgrep	Pylint	DeVAIC v2	Redlyne
PoisonPy F1	310	0.107	0.318	0.280	0.662	0.822
SafeCoder F1	1052	0.435	0.515	0.449	0.501	0.556
SecurityEval recall	121	40.5%	34.7%	59.5%	63.6%	93.4%
Copilot CWE recall	150	84.7%	51.3%	93.3%	68.0%	89.3%
PromSec recall	600	92.8%	87.0%	98.8%	85.2%	97.0%

On Copilot and PromSec, Pylint's "flag almost everything" mode nudges it slightly above Redlyne on raw recall — but at the cost of a 49.7% accuracy on the paired benchmarks (effectively random). On the only two datasets where precision is measurable, Redlyne leads.

Auto-remediation head-to-head

Detection finds bugs. Remediation fixes them. Three Python tools attempt code-modifying auto-fixes — only Redlyne is fast enough to use in-editor.

Tool	Applied	Targeted-clean (of applied)	Similarity → GT	Speed
Semgrep `--autofix`	7 / 155 (4.5%)	5 / 7 (71%)	0.82	~4700 ms
Redlyne	58 / 155 (37%)	52 / 58 (90%)	0.70	~3 ms

Targeted-clean is the honest "did the fix work?" metric: the specific rule that fired pre-patch — and that carries a remediation block — no longer fires post-patch, the patched source still compiles, and no new vulnerability classes were introduced. Of every patch Redlyne emits, 9 out of 10 satisfy this check.

On SafeCoder (526 real commit-based fixes), Redlyne applies a patch on 19% of samples and 69% of those pass the targeted-clean check. The drop from PoisonPy reflects how often production fixes involve function-level refactoring rather than the drop-in substitutions our regex-based rules target — a gap we're actively closing with multi-line template rules.

DeVAIC v2 stock ships only 2 remediation rules out of 441 (0.5%), so it's excluded from this table — it's a detection tool, listed alongside Redlyne in the detection comparison above. The PatchitPy bash pipeline (the closest open-source remediation peer) is under active investigation in our test setup.

Coverage at a glance

459 deterministic detection rules mapped to the OWASP Top 10:2025 taxonomy.

A01

Broken Access Control

A02

Security Misconfiguration

A03

Software Supply Chain Failures

A04

Cryptographic Failures

A05

Injection

A06

Insecure Design

A07

Authentication Failures

A08

Software & Data Integrity Failures

A09

Security Logging & Monitoring

See the full coverage breakdown →

Redlyne is a deterministic regex/AST rule engine. No LLM, no probabilistic guesses: every flag and every patch is reproducible by design. The rule set was derived from analysis of vulnerable Python samples in state-of-the-art security benchmark datasets, and evaluated end-to-end on 1700+ vulnerable Python samples in our May 2026 run.

Reproduce locally: python tests/bench_baselines.py for detection, python tests/bench_remediation.py for auto-fix.

See it in action

An AI assistant generates insecure code. You select it, run the analysis, and Redlyne suggests a remediated version you can apply in one click — without ever leaving the editor.

app.py — AI Output

Vulnerable

from flask import Flask, request, make_response

app = Flask(__name__)

@app.route("/profile")
def profile():
    username = request.args.get('username')
    response = make_response(f"Hello {username}")
    return response

Get Redlyne

Free, open source, runs locally. Install in 30 seconds and start scanning your AI-generated code.

Install for VS Code View on GitHub

Questions, partnerships, or commercial licensing? info@redlyne.io