Scoring Methodology

Every MCP server gets a score from 0 to 100 based on 9 quality signals and 8 security scanning layers. This page documents exactly how scores are calculated, what we check, and what we don't.

Quality Score

40 points. Measures project health and maintenance signals.

Signal	Points	How we check
MCP Schema	10	Checks for @modelcontextprotocol/sdk, mcp, fastmcp, or mcp-framework in dependencies. Also checks package.json scripts and bin entries for MCP-related commands.
Recent Activity	7	Full 7 points if committed within 30 days. Linear decay from 30 to 90 days. Negative penalty (-1 to -3) from 90 to 365 days. -3 points for repos inactive over a year. This penalty can reduce quality score below 40.
README	4	File exists and contains more than 200 characters
Tests	4	Test directory or test scripts present in package.json
TypeScript	4	tsconfig.json present or .ts source files detected
GitHub Stars	4	Logarithmic scale: log2(stars + 1) x 2, capped at 6
No Known CVEs	3	Clean npm audit against registry bulk advisory endpoint
License	2	LICENSE file present in repository root or license field in GitHub API
Issue Health	2	Ratio of open to total issues. Under 20% = full points. Linear decay from 20% to 50%. Zero above 50%.

Security Score

60 points. Eight scanning layers run against each server. Security is the majority of the score.

Internal scans (layers 1-7) run against the server's package.json content as a string. Vendor scans (layer 8) query external APIs. When a vendor API is unavailable or the server lacks an npm package name, that vendor scan is skipped and no deduction is applied.

1. Prompt Injection Detection

19 pattern categories: instruction override attempts, system prompt replacement, role manipulation, data exfiltration commands, credential harvesting, and concealment tactics.

2. Shell / Exec Pattern Detection

13 dangerous execution patterns: eval(), exec(), child_process, subprocess, os.system(), and shell pipe chains (curl | sh, wget | sh). Test files are excluded.

3. Credential Access Detection

11 patterns for SSH key access, AWS config directories, /etc/passwd, hardcoded API tokens (OpenAI sk-, GitHub ghp_, AWS AKIA), and dynamic env var harvesting.

4. Hidden Unicode Detection

Tag characters (U+E0000-E007F), zero-width characters, BiDi control characters, and Cyrillic homoglyphs that disguise malicious instructions in tool descriptions.

5. SAST (Static Application Security Testing)

15 patterns: insecure crypto (Math.random for secrets, MD5/SHA1), hardcoded credentials, prototype pollution vectors, dangerous deserialization (new Function), SQL injection patterns (template literals in queries), and path traversal risks.

6. MCP Tool Poisoning Detection

11 patterns: tool shadowing (names mimicking system commands), rug pull patterns (behavior changes post-approval), toxic flows (read-then-exfiltrate chains), excessive permission requests, and hidden instructions in HTML comments.

7. Tool Schema Analysis

Flags suspicious parameter names (password, secret, token, api_key) and overly permissive schemas (additionalProperties: true).

8. External Vulnerability Databases

Dependencies checked against OSV.dev (CVE database covering npm, PyPI, Go, Rust, Maven, NuGet). Repository alerts from GitHub Security Advisories. Each vendor requires an API key and runs independently. If a vendor is unavailable, that scan is skipped with no penalty.

Security Point Deductions

Internal scanner (prompt injection, SAST, MCP poisoning, malware detection, supply chain analysis): critical = -8 pts, high = -4 pts, medium = -2 pts (max -30)

OSV.dev: critical CVE x -5, high x -3, medium x -1 (max -18)

GitHub Advisories: critical x -3, high x -2 (max -12)

Total possible deductions exceed 60 points. Severe security issues can pull the composite score below what quality alone would produce.

Anti-Gaming Checks

Heuristic checks detect artificial inflation of quality signals.

Check	What it catches
Star/fork mismatch	100+ stars with 0 forks, or 500+ with fewer than 5 forks
New account, high stars	Account under 30 days old with 50+ stars
README padding	README over 5,000 characters with fewer than 100 lines of code
Issue gaming	Over 95% of issues closed in rapid succession

Grade Scale

A+90 - 100

A80 - 89

B70 - 79

B-65 - 69

C55 - 64

C-50 - 54

D40 - 49

D-35 - 39

F0 - 34

Grade Caps

Certain security findings override the numeric score and cap the maximum grade.

Critical malware patterns detected by internal scanner. Automatic F regardless of other scores.

Any critical security finding from internal scanning or vendor analysis. Grade capped at D maximum.

C-

Three or more high-severity findings. Grade capped at C- maximum.

Known Limitations

We try to be accurate, but scoring has inherent trade-offs. Here's where the system falls short.

Node.js bias

Quality signals are weighted toward the Node.js ecosystem. Python MCP servers using the mcp pip package may miss MCP Schema detection (15 pts), TypeScript (6 pts), and npm CVE checks (5 pts). A well-built Python server could lose up to 26 points from signals that don't apply to it. We plan to add Python-specific detection in a future update.

Limited source analysis

Internal security scans (layers 1-7) currently analyze the package.json content as text. We do not download or analyze the full source tree. This means security patterns in application code outside of package.json are not detected by internal scans. Vendor scans (OSV.dev, GitHub Advisories) check known vulnerability databases.

Vendor availability

External vendor scans (OSV.dev, GitHub Advisories) require API keys and network access. If a vendor is unavailable or the API key isn't configured, that scan is skipped and no deduction is applied. A server that was never scanned by a vendor looks the same as one that was scanned and cleared. We're working on distinguishing "not scanned" from "scanned clean" in score reports.

Scoring frequency

Servers are scored during crawl runs, not continuously. A score reflects the state of the server at the time it was last scored. The "last_scored" timestamp in API responses indicates when the score was calculated.

Recency penalty

The Recent Activity signal applies a negative penalty (up to -3 points) to repos inactive for 90+ days. This can reduce the total quality score below what the other signals would produce. We treat extended inactivity as a risk signal, not just a neutral absence. Stable, finished projects may be penalized unfairly by this.

Dispute a Score

If you believe a score is incorrect, you can challenge it with evidence. We review disputes within 48 hours.

Open a GitHub issue Email hello@pulrix.dev

Last updated: March 2026. Scoring methodology may change as we add new signals and data sources. Significant changes will be documented here.