RedlyneRedlyne

Security for AI-generated code

Built for AI. Built without AI.

Redlyne is a VS Code extension that detects vulnerabilities in AI-generated Python code and proposes one-click patches. Powered by a deterministic rule engine curated by security researchers — no LLM, no hallucinations, every flag is reproducible.

Deterministic engineExpert-curated rulesLocal executionApache 2.0

The reality of AI-generated Python code

Most of it is insecure. A meaningful share doesn't even compile.

76%

of AI-generated Python code contains security vulnerabilities. Tested in 2025 across 609 snippets from GitHub Copilot, Claude 3.7 Sonnet, and DeepSeek V3.

For Copilot specifically, the rate climbs to 84%.

31%

of GitHub Copilot's output is incomplete — missing imports or context. Almost one snippet in three ships without what it needs to actually run.

Across the major AI assistants, the average sits at ~13%.

Vulnerable code that is complete? Static analyzers can flag it, but with high noise and high false-positive rates on isolated snippets. Vulnerable code that's incomplete? AST-based analyzers can't even start — without imports, there's no parse tree to walk. Redlyne handles both, with deterministic pattern matching curated by security researchers.

Sources: IEEE DSN-W 2025 · Information and Software Technology, 2025

A security scanner for the way you actually work

We copy AI-generated snippets into our codebase throughout the day. Traditional security scanners need the full codebase to be effective — impractical for that kind of fast, snippet-level iteration. Redlyne is purpose-built for it, with a deterministic engine and a rule set curated by security researchers.

Built for AI. Built without AI.

Redlyne uses a deterministic rule engine — no LLM, no probabilistic guesses, no hallucinated fixes. Every vulnerability flag and remediation suggestion is reproducible and auditable.

Expert-curated rule set

Detection patterns are hand-crafted by security researchers, not auto-generated. Each rule targets a real vulnerability class observed in AI-generated Python code, including OWASP Top 10 categories.

One-click remediation

Right-click any Python selection in your editor. Redlyne flags the vulnerabilities and proposes a patched version you can apply with a single confirmation.

Privacy by design

Runs entirely on your machine. No code, no telemetry, no metadata is ever sent to a remote server. What you write stays with you.

Built on published research, benchmarked across five datasets

Redlyne's detection rules are derived from peer-reviewed research on AI-generated code vulnerabilities. Evaluated on 1700+ vulnerable Python samples across five public benchmarks — same files, same labels — so every claim is reproducible.

Reproducible with two commands from the repo
96.8%
Recall on PoisonPy

On PoisonPy (n=310 paired vulnerable/clean samples), Redlyne catches 150 of the 155 known-vulnerable files. F1 = 0.82, +0.16 over DeVAIC v2 — same engine, our extended rule set.

9 / 10
Auto-fixes verified safe

Of every patch Redlyne emits, 9 out of 10 successfully remove the targeted vulnerability — verified by an independent rule re-scan, syntax-checked, and free of new-vulnerability regressions.

~1.4 ms
Per-file scan latency

Median latency on PoisonPy. Redlyne runs in-process — no subprocess, no LLM inference. ~14× faster than Bandit, ~40× faster than Pylint, ~500× faster than Semgrep.

1700+
Samples evaluated

Tested across 5 public Python-vulnerability benchmarks — PoisonPy, SafeCoder, SecurityEval, Copilot CWE Scenarios, and PromSec. Redlyne analyzed every sample, on every dataset.

Built on published research

Redlyne builds on two peer-reviewed lines of work — the detection engine and the automated remediation approach — and benchmarks against the public datasets the original papers introduced.

Detection engine

Redlyne extends the DeVAIC v2.0 rule schema to 459 patterns and adds the pattern_not_file directive for scope-aware sanitization detection.

Cotroneo, D., De Luca, R., Liguori, P. (2025). "DeVAIC: A tool for security assessment of AI-generated code." Information and Software Technology, 177, 107572. DOI 10.1016/j.infsof.2024.107572.
Automated remediation

Redlyne extends pattern-based patching with 14 multi-line template rules, syntax-safety verification, and an independent rule re-scan ("targeted-clean") before surfacing any fix to the user.

Altiero, F., Cotroneo, D., De Luca, R., Liguori, P. (2025). "Securing AI Code Generation Through Automated Pattern-Based Patching." 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 282–289. DOI 10.1109/DSN-W65791.2025.00077.

All numbers shown on this page are our own measurements on a May 2026 run across five public Python-vulnerability datasets, not reproduced from any paper. Verify locally with python tests/bench_baselines.py.

Head-to-head with open-source baselines

Five tools across five public Python-vulnerability datasets, same operational conditions a developer sees in their editor. Evaluated May 2026.

PoisonPy — the only paired dataset where precision, F1 and accuracy are all measurable. n = 310 (155 vulnerable + 155 clean).

ToolAnalyzedPrecisionRecallF1Accuracyms / file
Bandit17%69.2%5.8%0.10751.6%19.6
Pylint17%55.8%18.7%0.28051.9%58.7
Semgrep86%69.6%20.6%0.31855.8%699.8
DeVAIC v2 (stock)100%68.0%64.5%0.66267.1%0.5
Redlyne100%71.4%96.8%0.82279.0%1.4

Analyzed is the share of samples the tool was able to process without a parse failure. Bandit and Pylint use AST parsing and silently give up on syntactically informal samples — 83% of PoisonPy — because the dataset is by design close to what AI assistants emit. Redlyne and DeVAIC v2 are regex-based and process every sample.

Pylint's accuracy on PoisonPy is 49.7% — effectively random. On the 17% of samples it parses, it flags 96.7% of them as "problematic", regardless of whether they're actually vulnerable.

Generalization across datasets. Headline metric per dataset, all five tools side-by-side — F1 for paired, recall for vulnerable-only:

DatasetnBanditSemgrepPylintDeVAIC v2Redlyne
PoisonPy F13100.1070.3180.2800.6620.822
SafeCoder F110520.4350.5150.4490.5010.556
SecurityEval recall12140.5%34.7%59.5%63.6%93.4%
Copilot CWE recall15084.7%51.3%93.3%68.0%89.3%
PromSec recall60092.8%87.0%98.8%85.2%97.0%

On Copilot and PromSec, Pylint's "flag almost everything" mode nudges it slightly above Redlyne on raw recall — but at the cost of a 49.7% accuracy on the paired benchmarks (effectively random). On the only two datasets where precision is measurable, Redlyne leads.

Auto-remediation head-to-head

Detection finds bugs. Remediation fixes them. Three Python tools attempt code-modifying auto-fixes — only Redlyne is fast enough to use in-editor.

ToolAppliedTargeted-clean
(of applied)
Similarity → GTSpeed
Semgrep --autofix7 / 155 (4.5%)5 / 7 (71%)0.82~4700 ms
Redlyne58 / 155 (37%)52 / 58 (90%)0.70~3 ms

Targeted-clean is the honest "did the fix work?" metric: the specific rule that fired pre-patch — and that carries a remediation block — no longer fires post-patch, the patched source still compiles, and no new vulnerability classes were introduced. Of every patch Redlyne emits, 9 out of 10 satisfy this check.

On SafeCoder (526 real commit-based fixes), Redlyne applies a patch on 19% of samples and 69% of those pass the targeted-clean check. The drop from PoisonPy reflects how often production fixes involve function-level refactoring rather than the drop-in substitutions our regex-based rules target — a gap we're actively closing with multi-line template rules.

DeVAIC v2 stock ships only 2 remediation rules out of 441 (0.5%), so it's excluded from this table — it's a detection tool, listed alongside Redlyne in the detection comparison above. The PatchitPy bash pipeline (the closest open-source remediation peer) is under active investigation in our test setup.

Coverage at a glance

459 deterministic detection rules mapped to the OWASP Top 10:2025 taxonomy.

A01
Broken Access Control
A02
Security Misconfiguration
A03
Software Supply Chain Failures
A04
Cryptographic Failures
A05
Injection
A06
Insecure Design
A07
Authentication Failures
A08
Software & Data Integrity Failures
A09
Security Logging & Monitoring

Redlyne is a deterministic regex/AST rule engine. No LLM, no probabilistic guesses: every flag and every patch is reproducible by design. The rule set was derived from analysis of vulnerable Python samples in state-of-the-art security benchmark datasets, and evaluated end-to-end on 1700+ vulnerable Python samples in our May 2026 run.

Reproduce locally: python tests/bench_baselines.py for detection, python tests/bench_remediation.py for auto-fix.

See it in action

An AI assistant generates insecure code. You select it, run the analysis, and Redlyne suggests a remediated version you can apply in one click — without ever leaving the editor.

app.py — AI Output
Vulnerable
from flask import Flask, request, make_response

app = Flask(__name__)

@app.route("/profile")
def profile():
    username = request.args.get('username')
    response = make_response(f"Hello {username}")
    return response

Get Redlyne

Free, open source, runs locally. Install in 30 seconds and start scanning your AI-generated code.

Questions, partnerships, or commercial licensing? info@redlyne.io