name: design-vision description: Transform design inspiration into implemented code through deep analysis, interactive vision discovery, planning, and checkpoint-based execution. Use when you have a screenshot/reference and want to build something similar while understanding and preserving the design intent.
Design Vision
Transform design inspiration into implemented code by deeply understanding both the visual and your intent.
The Problem This Solves
Traditional approach:
- Look at screenshot → Extract colors/fonts → Implement → Lose the original "feel"
This skill:
- Analyze screenshot deeply → Understand YOUR intent → Plan with vision → Implement with checkpoints → Verify against original
Workflow Phases
Phase 1: Deep Visual Analysis (Silent)
Input: Screenshot path, URL, or image file
Process:
- Use
ai-multimodalskill to analyze the image deeply - Extract ALL visual elements:
- Color palette (primary, secondary, accent, neutrals) with hex codes
- Typography (font families, sizes, weights, line heights)
- Spacing system (margins, padding patterns)
- Layout structure (grid, flex patterns, breakpoints)
- Component patterns (cards, buttons, forms, navigation)
- Visual effects (shadows, borders, gradients, blur)
- Micro-interactions implied by the design
- Interpret the design:
- Design style classification (Minimalism, Glassmorphism, Neo-brutalism, etc.)
- Emotional tone (calm, energetic, professional, playful)
- What principles make this design effective
- What creates the "feel" of the design
Output: Internal analysis object (NOT shown to user yet - used to inform questions)
Phase 2: Vision Discovery (Interactive)
Purpose: Understand what the USER wants, not just what's visible
Process:
- Generate contextual questions BASED on Phase 1 analysis
- Use
AskUserQuestionto gather intent
Smart Questions (informed by analysis):
Context questions:
- "What are you building? (app type, purpose, audience)"
- "What drew you to this design? What specifically appeals to you?"
Analysis-informed questions (examples):
- If analysis found heavy whitespace: "This design uses generous spacing for a calm feel - is that breathing room important, or are you more interested in the layout structure?"
- If analysis found muted colors: "The color palette is quite subdued - do you want similar tones, or different energy?"
- If analysis found complex animations: "This appears to have sophisticated animations - how important is motion to your project?"
- If analysis found specific typography: "The typography uses [detected font] which creates [effect] - do you want to match this, or have different type preferences?"
Adaptation questions:
- "What should feel DIFFERENT from this reference?"
- "Any constraints? (tech stack, existing design system, accessibility requirements)"
Output: User intent data
Phase 3: Vision Synthesis
Purpose: Create a source-of-truth document for all subsequent work
Process:
- Combine analysis + user answers into Vision Document
- Categorize elements:
- KEEP: Elements to preserve exactly as in reference
- ADAPT: Elements to modify for user's context
- IGNORE: Elements not relevant to user's needs
- Define goals:
- Emotional goals (how it should feel)
- Functional goals (what it needs to do)
- Constraints (tech, accessibility, etc.)
Present to user for confirmation:
## Vision Summary
**Building:** [what they're building]
**Inspired by:** [reference description]
**Target feel:** [emotional goals]
### Keep from reference:
- [element 1]
- [element 2]
### Adapt for your context:
- [element 1] → [how to adapt]
### Your additions:
- [requirements not in reference]
Does this capture your vision?
Use AskUserQuestion:
- header: "Vision"
- options:
- "Yes, this is right" - Proceed to planning
- "Needs adjustment" - Let me clarify
- "Start over" - Re-do discovery
Output: Approved Vision Document
Phase 4: Planning
Purpose: Convert vision into concrete implementation plan
Process:
- Invoke
writing-plansskill OR useEnterPlanMode - Plan should include:
- Component breakdown (what to build)
- Implementation order (dependencies)
- File structure
- Technical decisions (libraries, patterns)
- Checkpoints for vision verification
Present plan for approval before any code
Output: Approved implementation plan
Phase 5: Implementation with Checkpoints
Purpose: Build while staying true to the vision
Process:
-
Invoke
executing-plansskill with vision context -
Implement in stages:
- Stage 1: Structure - Layout, routing, basic components
- Stage 2: Core UI - Main components with styling
- Stage 3: Details - Typography, colors, spacing refinement
- Stage 4: Polish - Micro-interactions, animations, edge cases
-
At each stage checkpoint, ask:
Stage [N] complete. Here's what I built: [summary] Comparing to your vision: - [what matches] - [what might differ] Should I continue, adjust, or show you the current state?
Output: Implemented code
Phase 6: Vision Verification
Purpose: Ensure final result matches original intent
Process:
- Use
ai-multimodalto compare:- Original reference screenshot
- Current implementation (via
chrome-devtoolsscreenshot)
- Evaluate alignment with Vision Document
- Present comparison to user:
## Vision Check **Original reference:** [key characteristics] **Your implementation:** [how it compares] **Vision alignment:** [what matches, what differs] Are you satisfied, or should we refine?
Output: User approval or refinement requests
Modes
| Mode | Phases | When to Use |
|---|---|---|
full | All 6 phases | Complete design-to-code with maximum alignment |
quick | 1, 2 (brief), 4, 5 | Faster, fewer questions, still vision-aware |
analyze-only | 1, 2, 3 | Just create vision document, no implementation |
implement | 4, 5, 6 | Already have vision, just need to build |
To specify mode: User can say "quick mode" or skill can ask:
How thorough should the vision discovery be?
- Full (deep analysis, multiple questions, checkpoints)
- Quick (brief questions, faster to code)
- Analyze only (just create vision doc)
Skills Integration
This skill orchestrates:
| Phase | Skills Used |
|---|---|
| Analysis | ai-multimodal |
| Discovery | AskUserQuestion tool, brainstorming patterns |
| Synthesis | AskUserQuestion tool |
| Planning | writing-plans or EnterPlanMode |
| Implementation | executing-plans, frontend-design, aesthetic |
| Verification | ai-multimodal, chrome-devtools |
Context Persistence
Maintain throughout execution:
vision_context = {
original_reference: [image path/description],
analysis: [Phase 1 output],
user_intent: [Phase 2 answers],
vision_document: [Phase 3 output],
plan: [Phase 4 output],
checkpoints: [Phase 5 progress],
current_stage: [where we are]
}
Pass this context to each skill invocation so they all work toward the same vision.
Example Flow
User: "Build me a dashboard like this" [attaches screenshot]
Phase 1: [Silent analysis extracts: dark theme, glassmorphism cards, SF Pro font, purple accent, lots of data visualization, sidebar nav]
Phase 2:
- "What kind of dashboard is this for?" → "Analytics for my SaaS"
- "I notice the glassmorphism card style - is that frosted glass effect important to you?" → "Yes, love that"
- "The dark theme with purple accents - keep those colors or different?" → "Keep dark, but blue instead of purple"
- "What should feel different from this?" → "Less cluttered, I have fewer metrics"
Phase 3:
Vision Summary:
Building: SaaS analytics dashboard
Keep: Dark theme, glassmorphism cards, sidebar nav
Adapt: Purple → Blue accents, fewer metrics (cleaner)
Goals: Professional, modern, easy to scan
Phase 4: [Plan: 5 components, start with layout, then cards, then charts...]
Phase 5: [Build with checkpoints at each stage]
Phase 6: [Compare final to original, verify vision alignment]
Rules
- Never skip vision discovery - Even in quick mode, ask at least 2 contextual questions
- Analysis informs questions - Don't ask generic questions; make them specific to what was detected
- Vision document is source of truth - Reference it throughout implementation
- Checkpoints prevent drift - Verify alignment before moving to next stage
- User can always adjust - Provide modification options at every phase
- Pass context forward - Every skill invocation should know the full vision context