name: tdd description: > Use when implementing any feature or fix outside code-forge workflow — enforces Red-Green-Refactor cycle with mandatory test-first discipline. Supports three modes: (1) Standalone — ad-hoc TDD for quick changes, (2) Auto-Analysis — runs the full spec-forge:test-cases analysis pipeline (project profile, four-layer deep scan, multi-dimensional coverage) then implements all cases via TDD, (3) Driven — reads a test-cases.md document and implements each case via TDD.
Code Forge — TDD
⚡ Execution Entry Point
@../shared/execution-entrypoint.md
For this skill: start at Step 0 (Determine Mode). If you catch yourself about to say "falling back to manual TDD", STOP and go to the indicated step.
Test-Driven Development enforcement for any code change, with built-in code analysis.
When to Use
- Writing code outside of code-forge:impl workflow (ad-hoc changes, quick fixes)
- Adding tests to existing code that lacks coverage
- Implementing test cases from a spec-forge:test-cases document
- Any new feature, bug fix, or behavior change that needs test discipline
Note: code-forge:impl already enforces TDD internally. This skill is for work outside that workflow.
Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.
No exceptions. Not for "simple" changes. Not for "obvious" fixes. Not when under time pressure.
Design Discipline (Mandatory, Applies to All Modes)
Before any RED step in any mode (Driven, Auto-Analysis, Standalone), you MUST run the design-first pre-code checklist: read the relevant subsystem, consider the optimal interface-stable design, and decide whether to refactor existing code or add new code. The TDD cycle's REFACTOR step is the second enforcement point — every GREEN must be followed by a real consideration of whether the new code is the cleanest shape, not the most expedient one.
This discipline is the upstream defense against patch-first development. Read it once at the start of every session and again whenever you are tempted to add a new branch / wrapper / parallel module instead of refactoring:
@../shared/design-first.md
Step 0: Determine Mode
Examine the arguments to determine the operating mode:
| Argument | Mode | Behavior |
|---|---|---|
@docs/.../test-cases.md | Driven Mode | Read test cases document, implement each case via TDD |
@src/services/payment.ts or specific code path | Auto-Analysis Mode | Analyze specified code, design cases, implement via TDD |
| Feature name or description (e.g., "add validation to user signup") | Standalone Mode | Classic TDD — write tests for the described change |
| Empty (no arguments) | Auto-Analysis Mode | Scan project for coverage gaps, design cases, implement |
Driven Mode — Implementing from Test Cases Document
When a test-cases.md file is provided (generated by spec-forge:test-cases):
D.1 Read and Parse
- Read the test-cases document
- Extract all test cases (TC-MODULE-NNN entries)
- Identify which are already implemented (check existing test files for matching test names/IDs)
- Filter to unimplemented cases
- Sort by priority: P0 first, then P1, then P2
D.2 Confirm Scope
Present to user:
- "{N} test cases found, {X} already implemented, {Y} remaining"
- "Implement: (A) all remaining, (B) P0 only, (C) P0 + P1, (D) specific modules?"
D.3 Implement Loop
For each test case in scope:
- Read the case — extract preconditions, steps, expected result, not-expected, test infra
- Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers or test database; if "Mock external", set up mock for the specified third-party service; if "Temp dir", create temp directory; if "N/A", no special setup needed
- RED — Write a failing test that matches the case specification
- Test name should include TC ID:
test("TC-AUTH-001: create user with valid email returns 201", ...) - Preconditions become test setup (seed data, auth context, config)
- Steps become test actions
- Expected result becomes assertions
- "Not Expected" becomes negative assertions where applicable
- Test name should include TC ID:
- VERIFY RED — Run the test, confirm it fails correctly
- GREEN — Write minimal production code to make it pass (if the code already exists and passes, the case was already covered — note and move on)
- VERIFY GREEN — Run all tests, confirm clean pass
- REFACTOR — Clean up if needed
- Report — "TC-AUTH-001: DONE (test passes, implementation complete)"
D.4 Progress Tracking
After each case, display progress:
TDD Progress: {completed}/{total} ({percentage}%)
[x] TC-AUTH-001: Create user with valid email (P0) — DONE
[x] TC-AUTH-010: Create user with duplicate email rejected (P0) — DONE
[ ] TC-AUTH-011: Create user with invalid email format (P1) — next
[ ] TC-AUTH-030: Create user should NOT bypass email validation (P1)
Ask: "Continue with next case, skip, or pause?"
D.5 Completion
After all cases are implemented:
- Run full test suite
- Report: total cases implemented, all tests passing, coverage change
- Suggest: "Run
/code-forge:verifyto confirm completion"
Auto-Analysis Mode — Scan and Test
When the user points to code or says "help me write tests" without a test-cases document.
Iron Rule: Auto-Analysis uses the SAME full analysis as spec-forge:test-cases. The only difference is the output — auto-analysis produces code directly instead of a document. The analysis quality must be identical.
A.0 Full Test Case Analysis (same as spec-forge:test-cases Steps 1-5)
Execute the complete spec-forge:test-cases analysis pipeline. The full workflow is defined in the spec-forge test-cases-generation skill (spec-forge/skills/test-cases-generation/SKILL.md). The essential steps are inlined below — follow them exactly:
Step 1 — Determine Input Mode and Project Profile
- Determine input mode: Scan / Code / Spec (from user arguments)
- Detect project profile: Web API / CLI Tool / Frontend App / AI Agent / Data Pipeline / Function Library / SDK
- Detect: has database? has auth? has external APIs?
- Output explicit profile with rationale
Step 2 — Deep Scan and Extract (Four Layers)
- Use the language-specific deep extraction strategy (Python / TypeScript / Go / Rust / Java)
- Extract ALL testable units across four layers:
- Interface: public API surface, type contracts, trait/interface boundaries
- Logic: branch paths, error chains, state transitions, validation rules
- Architecture: module structure, layer boundaries, dependency direction
- Relationships: call graphs, data flow, event propagation, trait implementations
- Scan existing tests to determine current coverage
- Run scan verification (file coverage ≥ 90%, module tree completeness, re-export tracking)
- Produce structured Functional Inventory with all four layers per unit
Step 3 — Detect Dimensions
- Apply built-in dimensions: Coverage Depth (L1/L2/L3)
- Auto-detect project-specific dimensions (Auth Context, Trigger Mode, Input Source, etc.)
Step 4 — Confirm Scope with User
- Present Profile confirmation: "I detected this as {profile} ({rationale}). Correct?"
- Present scope: "{N} testable units, {X} have tests, {Y} don't. Cover: all / uncovered / specific modules?"
- Present detected dimensions for confirmation
- Ask for business rules the code can't reveal
Step 5 — Design Test Cases
- Per testable unit, generate at minimum:
- 1 × L1 (Happy Path)
- 2 × L2 (Boundary + Error)
- 1 × L3 (Negative — what should NOT happen)
- For interacting units: pairwise combination cases (L1 both succeed + L2 one fails + L3 should not combine)
- For auto-detected dimensions: cross with coverage depth using risk-based prioritization
- Apply conditional sections:
- Data Integrity cases (only if project has database)
- Security cases (only if project has auth or handles user input)
- Performance cases (only if project has latency/throughput requirements)
- Assign priorities: P0 (critical path) / P1 (important) / P2 (nice-to-have)
- Build coverage matrix internally: unit × depth, dimension coverage, combination coverage, gap analysis
Result: A complete set of structured test cases in memory — identical quality to what spec-forge:test-cases would produce as a document.
A.1 Optional: Save Test Cases Document
Ask the user: "Save the test cases as docs/{feature}/test-cases.md for future reference? (Y/n)"
- If yes → write the document following the spec-forge:test-cases template, then continue to A.2
- If no → keep in memory, continue to A.2
A.2 Implement via TDD
For each test case (sorted by priority: P0 → P1 → P2), follow the same TDD cycle as Driven Mode:
- Read the case — extract preconditions, steps, expected result, not-expected, test infra
- Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers; if "Mock external", set up mock; if "Temp dir", create temp directory; if "N/A", no setup
- RED — Write a failing test matching the case specification
- Test name should include TC ID:
test("TC-AUTH-001: create user with valid email returns 201", ...) - Preconditions → test setup; Steps → test actions; Expected result → assertions; Not Expected → negative assertions
- Test name should include TC ID:
- VERIFY RED — Run the test, confirm it fails correctly
- GREEN — Write minimal production code to make it pass
- VERIFY GREEN — Run all tests, confirm clean pass
- REFACTOR — Clean up if needed
- Report — "TC-AUTH-001: DONE"
A.3 Progress Tracking
After each case, display progress (same format as Driven Mode D.4):
TDD Progress: {completed}/{total} ({percentage}%)
[x] TC-AUTH-001: Create user with valid email (P0) — DONE
[x] TC-AUTH-010: Duplicate email rejected (P0) — DONE
[ ] TC-AUTH-011: Invalid email format (P1) — next
Ask: "Continue with next case, skip, or pause?"
A.4 Completion
After all cases are implemented:
- Run full test suite
- Report: total cases implemented, all tests passing, coverage statistics
- If test cases were saved to file (A.1), report the file path
- Suggest: "Run
/code-forge:verifyto confirm completion"
Standalone Mode — Classic TDD
For ad-hoc changes where the user describes what to build or fix:
Workflow
RED (write failing test) → VERIFY RED → GREEN (minimal code) → VERIFY GREEN → REFACTOR → REPEAT
The Cycle
Complete each phase fully before moving to the next.
1. RED — Write a Failing Test
- One minimal test showing the desired behavior
- Clear, descriptive test name
- Use real code, not mocks (unless unavoidable: external APIs, time-dependent behavior)
- One behavior per test
2. VERIFY RED — Watch It Fail (MANDATORY)
Run the test. Confirm:
- It fails (not errors)
- The failure message describes the missing behavior
- It fails because the feature is missing, not because of typos or setup issues
If the test passes: you're testing existing behavior. Rewrite the test. If the test errors: fix the error, re-run until it fails correctly.
3. GREEN — Write Minimal Code
- Simplest code that makes the test pass
- No extra features, no "while I'm here" improvements
- No premature abstractions — three similar lines beats a premature helper
4. VERIFY GREEN — Watch It Pass (MANDATORY)
Run the test. Confirm:
- The new test passes
- All other tests still pass
- Output is clean (no warnings, no errors)
If the new test fails: fix the code, not the test. If other tests fail: fix them now, before proceeding.
5. REFACTOR — Clean Up (After Green Only)
- Remove duplication, improve names, extract helpers
- Keep all tests green throughout
- Do NOT add new behavior during refactor
- Apply design-first here. Look at the GREEN code in the context of the surrounding subsystem. Did GREEN push you toward a patch (new branch, new wrapper, parallel path) when a small refactor of the existing structure would have been cleaner? If so, refactor now while the test is green. The REFACTOR step is not optional — it is the moment design-first is enforced inside the TDD cycle. See
@../shared/design-first.mdfor the discipline.
6. REPEAT
Go back to Step 1 for the next behavior.
Decision Rules
| If you're about to... | Instead... | Why |
|---|---|---|
| Write production code without a test | STOP — write the failing test first | Tests written after implementation pass immediately and prove nothing |
| Skip testing because the change is "simple" | Write the test — it will be quick if it's truly simple | Simple code has the sneakiest bugs (off-by-one, null edge cases) |
| Apply a quick fix without a regression test | Write the test, then fix | Untested fixes become permanent regressions |
| Continue with code that wasn't test-driven | Consider rewriting test-first | Sunk cost — untested code is a liability regardless of time spent |
External Dependency Rules
Principle: test your own dependencies for real; only mock what you don't control.
| Your Dependency | Approach |
|---|---|
| Own database | Real DB (TestContainers, test instance, SQLite in-memory) |
| Own file system | Real temp directory |
| Own cache / message queue | Real (TestContainers, embedded) |
| External third-party API | Mock / stub acceptable |
| Non-deterministic input (time, random) | Inject controlled values |
- For projects without a database or external I/O: most tests are pure unit tests — no special infra needed
- For write operations: verify state after the operation (DB query / file check / store assertion)
Example
Task: Add isPalindrome(str) function
1. RED — Write test:
test("isPalindrome returns true for 'racecar'", () => {
expect(isPalindrome("racecar")).toBe(true);
});
2. VERIFY RED — Run: npm test
✗ ReferenceError: isPalindrome is not defined ← fails correctly
3. GREEN — Minimal code:
function isPalindrome(str) {
return str === str.split("").reverse().join("");
}
4. VERIFY GREEN — Run: npm test
✓ isPalindrome returns true for 'racecar' ← passes
42 passed, 0 failed
5. REFACTOR — (no changes needed)
6. REPEAT — next test: edge case with empty string
Test runner detection: Check package.json scripts, pytest.ini, Cargo.toml, go.mod, or Makefile for the project's test command before starting the cycle. Use the same runner consistently.
Verification Checklist
Before claiming work is complete:
- Every new function/method has at least one test
- Watched each test fail before implementing
- Each test failed for the expected reason (not errors)
- Wrote minimal code per test (no gold-plating)
- All tests pass with clean output
- Edge cases and error paths covered
- Mocks used only when unavoidable
- Database-touching tests use real database
When Stuck
- Test too complicated to write → design is too complicated, simplify first
- Must mock everything → code is too coupled, extract interfaces
- Test setup is huge → extract test helpers or fixtures
- No test-cases document and unsure what to test → run
/spec-forge:test-casesfirst to generate a structured case set