name: evaluate-skill description: Evaluate a local Codex skill in engineer-friendly terms. Use when the user says "evaluate this skill", "give me an analysis of the game dev skill", "audit this skill", "why did this score that way", "what should I fix first", or asks for a skill-specific report before benchmarking it.
Evaluate Skill
Use this skill when the target is a local skill directory or SKILL.md file.
Workflow
- Treat "Evaluate this skill." as the default entrypoint.
- If the user names a skill instead of giving a path, resolve it locally first, preferring
~/.codex/skills/<skill-name>and then repo-localskills/<skill-name>. - If the user says the request in natural language first, use
plugin-eval start <skill-path> --request "<user request>" --format markdownto show the routed path clearly. - Run
plugin-eval analyze <skill-path> --format markdown. - Review
At a Glance,Why It Matters,Fix First, andRecommended Next Stepbefore drilling into details. - Explain which findings are structural, which are budget-related, and which are code-related.
- If the user asks for an "analysis" of the skill, do not stop at the report. Also run
plugin-eval init-benchmark <skill-path>and show the setup questions for refining the starter scenarios in.plugin-eval/benchmark.json. - If the user wants real usage numbers, switch to "Measure the real token usage of this skill." and run the benchmark flow.
- After observed usage is available, use
plugin-eval measurement-plan <skill-path> --observed-usage <usage.jsonl> --format markdownto recommend what to instrument or improve next. - If the user wants a rewrite plan, route to
../improve-skill/SKILL.md.
Skill-Specific Priorities
- frontmatter validity
nameanddescriptionquality- progressive disclosure and reference usage
- broken relative links
- oversized
SKILL.mdor descriptions - helper script quality for TypeScript and Python files
Chat Requests To Recognize
Evaluate this skill.Give me an analysis of the game dev skill.Audit this skill.Why did this skill score that way?What should I fix first?Measure the real token usage of this skill.
Commands
plugin-eval start <skill-path> --request "Evaluate this skill." --format markdown
plugin-eval analyze <skill-path> --format markdown
plugin-eval explain-budget <skill-path> --format markdown
plugin-eval measurement-plan <skill-path> --format markdown
plugin-eval init-benchmark <skill-path>
plugin-eval benchmark <skill-path> --dry-run
Reference
../../references/chat-first-workflows.md