Corpus
What We Scan
2,500+ public repositories across 8 artifact types, discovered from npm, PyPI, GitHub topics, and artifact-type file searches. Tiered by quality signal — vendor and high-star repos first.
2,500+
Repos scanned
8
Artifact types
120+
Checkers
Nightly
Update frequency
Repository Tiers
Repos are tiered by quality signal. T1 and T2 are scanned first and most frequently. T5 (0★ ecosystem) is scanned selectively by artifact type — bare server repos with no stars are deferred until post-seed volume targets justify the compute.
T1
Vendor / Curated / npm (500+★)
npm registry, PyPI registry, Smithery catalog, Curated security lists, Official vendor orgs
Official vendor servers (Stripe, Shopify, Docker, Cloudflare, Anthropic), curated security lists, all npm/PyPI packages with the 'mcp' keyword. Highest quality signal.
575
repos
T2
GitHub (100–499★)
github topic: mcp-server, github topic: claude-skill, github topic: cursor-rules
Community repos discovered via GitHub topic search with 100–499 stars. Strong signal — starred repos have real users.
986
repos
T3
GitHub (10–99★)
github topic search, ecosystem file search
Smaller community repos with 10–99 stars. Good signal for emerging tools and niche integrations.
132
repos
T4
GitHub (1–9★)
artifact-type file search, ecosystem discovery
Low-star repos promoted from T5 due to high-value artifact types (Kiro specs, cursor rules, Copilot instructions).
178
repos
T5
0★ Ecosystem
kiro steering file search, copilot instructions search, hook config search
Zero-star repos discovered by scanning GitHub for specific artifact file patterns. High noise, scanned selectively by artifact type.
1305
repos
Artifact Types
The scanner detects 8 artifact types per repo and runs the relevant checker modules. A single repo can contain multiple artifact types — AutoGPT has server code, skill files, hooks, and agent configs.
⚙️MCP Server
1,200+npm/PyPI packages implementing the Model Context Protocol. The primary attack surface — these run as processes with tool execution rights.
Detected files
package.json with mcp keyword
pyproject.toml with mcp keyword
server.py / index.ts with tool handlers
CHK-036 CVECHK-049 no authCHK-081 execCHK-090 0.0.0.0 bind
📄Skill File
250+SKILL.md files and .claude/skills/ directories. Loaded by Claude Code at session start. Can contain credential access instructions and data exfiltration patterns.
Detected files
SKILL.md
skills/*.md
.claude/skills/**/*.md
CHK-115 credential accessCHK-027 exfiltrationCHK-023 injection
🤖Agent Config
180+AGENTS.md and .claude/agents/*.md files defining agent system prompts and tool access. High-privilege context loaded automatically.
Detected files
AGENTS.md
.claude/agents/*.md
agent.yaml
CHK-115CHK-023CHK-027
🪝Claude Hook
207+hooks.json lifecycle hooks that execute shell commands at Claude Code events (PreToolUse, PostToolUse). Direct shell execution surface.
Detected files
.claude/hooks/hooks.json
.claude/settings.json (hooks key)
CHK-001 hook execCHK-062 hook shellCHK-073 curl|bash
🖱️Cursor Rules
129+.cursorrules and .cursor/rules/*.md files. Automatically loaded in Cursor IDE. Injection surface for developer environment attacks.
Detected files
.cursorrules
.cursor/rules/*.md
cursor.rules
CHK-116 injectionCHK-117 credentialCHK-118 exfiltration
🌀Kiro Spec
97+.kiro/steering/ files. Automatically loaded when a project is opened in the Kiro IDE. First scanner to cover this artifact type.
Detected files
.kiro/steering/*.md
.kiro/specs/*.md
CHK-119 injectionCHK-120 permissions
🐙Copilot Instructions
67+.github/copilot-instructions.md. Loaded automatically by GitHub Copilot. Injection and credential surface in enterprise repos.
Detected files
.github/copilot-instructions.md
CHK-121 injectionCHK-122 credentialCHK-123 exfiltration
🔌Plugin / Marketplace
80+marketplace.json and .claude-plugin/ manifests defining tool permissions and marketplace metadata.
Detected files
marketplace.json
.claude-plugin/manifest.json
CHK-125 excessive agencyCHK-126 metadata
Discovery Sources
What We Don't Scan
Private repositories
No access without org OAuth grant. Org-connected repos are scanned separately under org_id isolation.
Glama catalog (5,956 repos)
No star data, no quality signal. Deferred post-seed when volume > quality tradeoff shifts.
Aggregator / collection repos
Repos that are curated lists of other repos (awesome-mcp-servers, etc.) are excluded — they would inflate stats without adding signal.
Runtime / memory state
Static analysis only in v1 (ADR-001). Runtime monitoring is a v2 feature after seed funding.
Proprietary / closed-source packages
No source code available for static analysis. Registry metadata only.