Scoring Methodology
Every MCP server gets a score from 0 to 100 based on 9 quality signals and 8 security scanning layers. This page documents exactly how scores are calculated, what we check, and what we don't.
Quality Score
40 points. Measures project health and maintenance signals.
| Signal | Points | How we check |
|---|---|---|
| MCP Schema | 10 | Checks for @modelcontextprotocol/sdk, mcp, fastmcp, or mcp-framework in dependencies. Also checks package.json scripts and bin entries for MCP-related commands. |
| Recent Activity | 7 | Full 7 points if committed within 30 days. Linear decay from 30 to 90 days. Negative penalty (-1 to -3) from 90 to 365 days. -3 points for repos inactive over a year. This penalty can reduce quality score below 40. |
| README | 4 | File exists and contains more than 200 characters |
| Tests | 4 | Test directory or test scripts present in package.json |
| TypeScript | 4 | tsconfig.json present or .ts source files detected |
| GitHub Stars | 4 | Logarithmic scale: log2(stars + 1) x 2, capped at 6 |
| No Known CVEs | 3 | Clean npm audit against registry bulk advisory endpoint |
| License | 2 | LICENSE file present in repository root or license field in GitHub API |
| Issue Health | 2 | Ratio of open to total issues. Under 20% = full points. Linear decay from 20% to 50%. Zero above 50%. |
Security Score
60 points. Eight scanning layers run against each server. Security is the majority of the score.
Internal scans (layers 1-7) run against the server's package.json content as a string. Vendor scans (layer 8) query external APIs. When a vendor API is unavailable or the server lacks an npm package name, that vendor scan is skipped and no deduction is applied.
1. Prompt Injection Detection
19 pattern categories: instruction override attempts, system prompt replacement, role manipulation, data exfiltration commands, credential harvesting, and concealment tactics.
2. Shell / Exec Pattern Detection
13 dangerous execution patterns: eval(), exec(), child_process, subprocess, os.system(), and shell pipe chains (curl | sh, wget | sh). Test files are excluded.
3. Credential Access Detection
11 patterns for SSH key access, AWS config directories, /etc/passwd, hardcoded API tokens (OpenAI sk-, GitHub ghp_, AWS AKIA), and dynamic env var harvesting.
4. Hidden Unicode Detection
Tag characters (U+E0000-E007F), zero-width characters, BiDi control characters, and Cyrillic homoglyphs that disguise malicious instructions in tool descriptions.
5. SAST (Static Application Security Testing)
15 patterns: insecure crypto (Math.random for secrets, MD5/SHA1), hardcoded credentials, prototype pollution vectors, dangerous deserialization (new Function), SQL injection patterns (template literals in queries), and path traversal risks.
6. MCP Tool Poisoning Detection
11 patterns: tool shadowing (names mimicking system commands), rug pull patterns (behavior changes post-approval), toxic flows (read-then-exfiltrate chains), excessive permission requests, and hidden instructions in HTML comments.
7. Tool Schema Analysis
Flags suspicious parameter names (password, secret, token, api_key) and overly permissive schemas (additionalProperties: true).
8. Third-Party Vendor Scanning
Package tarballs downloaded from npm and scanned by VirusTotal (70+ antivirus engines). Supply chain analysis via Socket.dev (install scripts, obfuscated code, typosquats). Dependencies checked against OSV.dev (CVE database covering npm, PyPI, Go, Rust, Maven, NuGet). Repository alerts from GitHub Security Advisories. Each vendor requires an API key and runs independently. If a vendor is unavailable, that scan is skipped with no penalty.
Security Point Deductions
Internal scanner (prompt injection, SAST, MCP poisoning): critical = -8 pts, high = -4 pts, medium = -2 pts (max -20)
VirusTotal: malicious engine x -5, suspicious x -3 (max -15)
Socket.dev: critical supply chain issue x -5, high x -2 (max -10)
OSV.dev: critical CVE x -5, high x -3, medium x -1 (max -10)
GitHub Advisories: critical x -3, high x -2 (max -5)
Total possible deductions exceed 60 points. Severe security issues can pull the composite score below what quality alone would produce.
Anti-Gaming Checks
Heuristic checks detect artificial inflation of quality signals.
| Check | What it catches |
|---|---|
| Star/fork mismatch | 100+ stars with 0 forks, or 500+ with fewer than 5 forks |
| New account, high stars | Account under 30 days old with 50+ stars |
| README padding | README over 5,000 characters with fewer than 100 lines of code |
| Issue gaming | Over 95% of issues closed in rapid succession |
Grade Scale
Grade Caps
Certain security findings override the numeric score and cap the maximum grade.
Any VirusTotal engine flags the package as malware. Automatic F regardless of other scores.
Any critical security finding from internal scanning or vendor analysis. Grade capped at D maximum.
Three or more high-severity findings. Grade capped at C- maximum.
Known Limitations
We try to be accurate, but scoring has inherent trade-offs. Here's where the system falls short.
Node.js bias
Quality signals are weighted toward the Node.js ecosystem. Python MCP servers using the mcp pip package may miss MCP Schema detection (15 pts), TypeScript (6 pts), and npm CVE checks (5 pts). A well-built Python server could lose up to 26 points from signals that don't apply to it. We plan to add Python-specific detection in a future update.
Limited source analysis
Internal security scans (layers 1-7) currently analyze the package.json content as text. We do not download or analyze the full source tree. This means security patterns in application code outside of package.json are not detected by internal scans. Vendor scans (VirusTotal, Socket.dev) analyze the actual published package.
Vendor availability
Third-party vendor scans (VirusTotal, Socket.dev, OSV.dev, GitHub Advisories) require API keys and network access. If a vendor is unavailable or the API key isn't configured, that scan is skipped and no deduction is applied. A server that was never scanned by a vendor looks the same as one that was scanned and cleared. We're working on distinguishing "not scanned" from "scanned clean" in score reports.
Scoring frequency
Servers are scored during crawl runs, not continuously. A score reflects the state of the server at the time it was last scored. The "last_scored" timestamp in API responses indicates when the score was calculated.
Recency penalty
The Recent Activity signal applies a negative penalty (up to -3 points) to repos inactive for 90+ days. This can reduce the total quality score below what the other signals would produce. We treat extended inactivity as a risk signal, not just a neutral absence. Stable, finished projects may be penalized unfairly by this.
Dispute a Score
If you believe a score is incorrect, you can challenge it with evidence. We review disputes within 48 hours.
Last updated: March 2026. Scoring methodology may change as we add new signals and data sources. Significant changes will be documented here.