Error Recovery Patterns Skill
This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw).
Purpose
Guide developers in implementing robust error recovery patterns to:
- Reduce retry loops in agent sessions (target: <10% vs current 23%)
- Implement circuit breakers to prevent infinite retry loops
- Add proactive recovery for installation, dependency, and API failures
- Improve debug logging for recovery attempts
When to Use This Skill
Invoke this skill when:
- Implementing retry logic for network operations, installations, or API calls
- Debugging retry loop issues in workflows or agent sessions
- Adding error recovery patterns to new or existing code
- Understanding transient vs non-transient error classification
- Implementing circuit breakers or exponential backoff
- Adding debug logging for recovery attempts
Key Concepts Covered
1. Circuit Breaker Pattern
- Maximum retry limits (standard: 3 attempts)
- Exponential backoff strategies
- Fail-fast on non-transient errors
- Implementation in JavaScript, Shell, and Go
2. Installation Failure Recovery
- NPM installation with cache clearing and registry fallbacks
- Python pip installation with mirror alternatives
- Docker image pull with retry and rate limit handling
- Copilot CLI installation with network retry
3. API Timeout and Rate Limit Handling
- GitHub API rate limit detection and backoff
- Transient error detection patterns
- Custom retry configuration for different APIs
- Rate limit-specific retry strategies
4. Debug Logging for Recovery
- Logger package usage for retry attempts
- Category naming conventions (pkg:filename)
- DEBUG environment variable patterns
- Zero-overhead logging when disabled
5. Error Categorization
- Transient vs non-transient errors
- Network errors, timeout patterns
- HTTP error codes (502, 503, 504)
- GitHub-specific errors (rate limits, abuse detection)
Anti-Patterns to Avoid
This skill explicitly covers anti-patterns to avoid:
- ❌ Infinite retry loops without maximum limits
- ❌ Retrying validation errors that won't self-correct
- ❌ No backoff delay between attempts
- ❌ Silent retries without logging
- ❌ Retrying non-transient errors
Code Examples Provided
The skill includes production-ready examples for:
- JavaScript retry with
withRetry()function - Shell script retry loops with exponential backoff
- Go retry patterns with context and timeouts
- NPM/pip/docker installation recovery
- GitHub API rate limit handling
- Debug logging for all recovery attempts
Related Skills
- error-messages - Error message formatting and style guide
- error-pattern-safety - Safety guidelines for error pattern regex
- developer - General development guidelines and conventions
Full Documentation
Complete documentation available at: ../../scratchpad/error-recovery-patterns.md
This skill references the comprehensive error recovery patterns document which includes:
- Console formatting requirements
- Error wrapping patterns
- Common error scenarios with step-by-step resolution
- Error message templates
- Debugging runbook
- Error categorization decision trees
- Metrics and monitoring strategies