tanav.aiScanResearchARDGet Started
Open appTry free scan →
Methodology

How We Verify Findings

Static analysis generates candidates. AI verification + human audit determines what's real. No finding cited in pitch materials or research posts without documented TP/FP assessment.

Scan Pipeline

1
Discovery
Repos discovered via npm/PyPI registry crawl, GitHub topic search, and artifact-type file search. Each repo assigned a tier (T1–T5) based on star count and source quality. Aggregators and collections excluded before scanning.
2
Static Analysis
22 scanner modules run concurrently against the cloned repository. Each module covers a specific artifact type or vulnerability category. All 120+ detection rules are hand-written with documented true positives and negatives (ADR-010).
3
Score Computation
Weighted sum of finding severities with confidence multiplier. Floor rules applied for high-severity categories. Score is deterministic — same repo always produces same score with same checker versions.
4
AI Verification
CRITICAL findings from differentiator checkers (CHK-115, CHK-119, CHK-027, CHK-089) are submitted to the AI jury. Each finding gets a verdict: CONFIRMED, LIKELY, or FALSE_POSITIVE. Verdicts are cached in verify_cache.json and persist across rescans.
5
Corpus Audit
Before any stat is published, every high-volume checker is sampled (20 findings each) and FP rate measured. Checkers with >50% FP rate are fixed before numbers are cited. The 24% critical figure reflects post-audit, post-fix numbers.
6
Responsible Disclosure
Named findings are disclosed to maintainers 7+ days before publication. Evidence is redacted in public reports. Full reports available to maintainers on request.

AI Verification (Jury System)

CRITICAL findings from differentiator checkers are submitted to an LLM jury with structured context: the checker's intent, the evidence, the file path, and the surrounding code context. The jury returns a structured verdict with a one-sentence rationale.

Input: checker_id, checker_intent, evidence, file_path, repo_context
Prompt: "This checker fires when: [intent].
        Here is the finding. Is this a true positive?"
Output: { verdict: "CONFIRMED"|"LIKELY"|"FALSE_POSITIVE",
         explanation: "one sentence, max 20 words" }

Verdicts are cached in verify_cache.json and persist across rescans. A cached CONFIRMED verdict is never downgraded by a re-scan — only a human analyst can override a confirmed verdict.

Verdict Definitions

CONFIRMED
Criteria
·Evidence matches the finding pattern with no ambiguity
·Context rules out false positive (runtime source file, not test/doc)
·The security impact is clear and non-speculative
Example
CHK-115: SKILL.md contains 'cat ~/.aws/credentials' in a task step. File path is in .claude/skills/, not a test fixture. Evidence is unambiguous credential access instruction.
LIKELY
Criteria
·Evidence matches but context is ambiguous
·Security impact is probable but not certain
·Manual review recommended before blocking
Example
CHK-027: SKILL.md contains 'send results to https://api.example.com'. Domain is not in known-safe list. Could be legitimate API call or exfiltration — context required.
FALSE_POSITIVE
Criteria
·Evidence matches the pattern but context makes it benign
·File is a test fixture, documentation example, or placeholder
·The pattern fires on non-runtime code (e.g. key header in certgen test)
Example
CHK-042: '-----BEGIN RSA PRIVATE KEY-----' in pkg/tls/certgen_test.go. Test file generating ephemeral TLS certs — pattern correct but context is benign.

False Positive Audit Log

Before locking pitch statistics, every high-volume checker is sampled (20 random findings) and FP rate assessed. Checkers above 50% FP are fixed at the root cause — never globally suppressed.

CHK-023 (injection patterns)
Problem
~90% — normal skill MUST/CRITICAL instructions triggered
Fix
Tightened to require explicit override language ('ignore previous instructions', 'disregard system prompt', 'bypass security'). Imperative workflow instructions excluded.
Impact
~4,000 HIGH findings eliminated
CHK-049 (no auth)
Problem
~30% — skill/hook/agent repos fired even though they have no server
Fix
Scoped to repos with server/mcp artifact hints only. Skill-only repos excluded from auth check.
Impact
~500 HIGH findings eliminated
CHK-133 (placeholder secrets)
Problem
~80% — 'YourMySQLRootPassword', 'secure123' scored as CRITICAL
Fix
Shannon entropy gate raised to 3.5 bits/char + expanded placeholder list covering common tutorial patterns.
Impact
~700 CRITICAL demoted to INFO
CHK-105 (CI secret echo)
Problem
~95% — standard >> $GITHUB_OUTPUT writes triggered
Fix
Excluded lines writing to $GITHUB_OUTPUT, $GITHUB_ENV, $GITHUB_PATH — the mandated GitHub Actions step-output idiom since 2022.
Impact
~170 HIGH findings eliminated
CHK-108 (credential URLs in docs)
Problem
~95% — i18n locale files, README proxy examples triggered
Fix
Extended docs context to cover /locales/, /i18n/ paths and .json files containing URL format examples.
Impact
~96 CRITICAL demoted to LOW

Core Principles

Deterministic scoring
Same repo + same checkers = same score. No randomness, no model drift, no A/B testing on security conclusions.
Hand-written detection logic
ADR-010: all detection rules written deliberately with documented true/false positives. No ML classifiers in v1.
Audit trail
Every finding has a checker_id traceable to source code. Every verdict has a cached rationale. SOC 2 audit log on all scans.
Responsible disclosure
Named findings disclosed 7+ days before publication. Severity never inflated. Evidence redacted in public reports.
No suppression
FP checkers are fixed at the root cause — path context, entropy gate, or scope filter. Global suppression is never used.
Stats locked before publishing
Critical percentage figures are locked after full FP audit and not adjusted retroactively to match a narrative.