name: context-engineering description: Manage AI agent context effectively — what to include, what to exclude, compression strategies, and context hierarchy for optimal performance. version: "1.0.0" last-updated: "2026-04-17" model_tested: "claude-sonnet-4-6" category: meta platforms: [claude-code, codex, gemini-cli, cursor, copilot, windsurf, cline] language: en geo_relevance: [global] priority: high dependencies: mcp: [] skills: [] apis: [] data: [] update_sources:
- url: "https://www.augmentcode.com/guides/how-to-build-agents-md" check_frequency: "quarterly" last_checked: "2026-04-17" license: MIT
Context Engineering
Based on ETH Zurich research: overly detailed instructions reduce task success by 3%, increase token cost by 20%, and add 2-4 reasoning steps.
When to Use
- Writing SKILL.md, AGENTS.md, or system prompts
- Debugging poor agent performance
- Optimizing token costs
- Designing multi-agent workflows
- Reducing context window pressure
Context Hierarchy (5 Levels)
Most persistent → most transient:
| Level | Content | Persistence | Example |
|---|---|---|---|
| 1. Rules | Project-wide standards | Always loaded | CLAUDE.md, AGENTS.md |
| 2. Spec | Feature/session scope | Per feature | PRD, architecture docs |
| 3. Source | Per task | Per task | Relevant source files |
| 4. Errors | Per iteration | Per attempt | Test failures, stack traces |
| 5. History | Accumulates | Session | Conversation history |
Principle: Levels 1-2 are curated (high leverage). Levels 3-5 are per-call (keep minimal).
What to Include
Include ONLY what the agent cannot discover independently:
- Non-obvious conventions ("we use snake_case for DB columns")
- Project-specific constraints ("never modify the auth module")
- Architectural decisions not in code ("we chose Drizzle over Prisma because...")
- External dependencies not discoverable ("deploy via internal CI, not GitHub Actions")
What NOT to Include
The agent can discover these itself — including them wastes tokens:
- Tech stack (visible in package.json / requirements.txt)
- File structure (visible via ls / find)
- Key files (visible via search)
- Build commands (visible in scripts / Makefile)
- Standard patterns (the model already knows React, Express, etc.)
Sizing Guidelines
| Context Type | Max Size | Rationale |
|---|---|---|
| AGENTS.md | 500-1000 tokens | ETH Zurich: more = worse |
| SKILL.md (core) | 1000-2500 tokens | Balance detail vs overhead |
| references/ per skill | 500-1000 tokens | Support data, not duplicate |
| System prompt total | < 5K tokens | Beyond this: diminishing returns |
Compression Strategies
- Remove examples when the pattern is clear — one example > three redundant ones
- Use tables over prose — 50% fewer tokens for structured info
- Remove "obvious" instructions — "write clean code" is noise
- Use references for static data — move schemas/checklists to files
- Lazy-load context — only load what's needed for current task
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| "Always be thorough" | Forces effort=high, +35% tokens | Remove — model handles this |
| "Think step by step" | Redundant with adaptive thinking | Remove on modern models |
| Repeating the same rule 3x | Token waste, no benefit | State once, clearly |
| Including full API docs | Context overflow | Link to docs, summarize key parts |
| "You are a helpful assistant" | Generic, no value | Use specific task context |
What This Skill Does NOT Do
- Does not manage conversation memory (different problem)
- Does not optimize the model itself (skill ≠ fine-tuning)
- Does not handle multi-agent coordination (orchestration concern)