name: tdd description: > Use when implementing any feature or fix outside code-forge workflow — enforces Red-Green-Refactor cycle with mandatory test-first discipline. Supports three modes: (1) Standalone — ad-hoc TDD for quick changes, (2) Auto-Analysis — runs the full spec-forge:test-cases analysis pipeline (project profile, four-layer deep scan, multi-dimensional coverage) then implements all cases via TDD, (3) Driven — reads a test-cases.md document and implements each case via TDD.

Code Forge — TDD

⚡ Execution Entry Point

@../shared/execution-entrypoint.md

For this skill: start at Step 0 (Determine Mode). If you catch yourself about to say "falling back to manual TDD", STOP and go to the indicated step.

Test-Driven Development enforcement for any code change, with built-in code analysis.

When to Use

Writing code outside of code-forge:impl workflow (ad-hoc changes, quick fixes)
Adding tests to existing code that lacks coverage
Implementing test cases from a spec-forge:test-cases document
Any new feature, bug fix, or behavior change that needs test discipline

Note: code-forge:impl already enforces TDD internally. This skill is for work outside that workflow.

Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.

No exceptions. Not for "simple" changes. Not for "obvious" fixes. Not when under time pressure.

Design Discipline (Mandatory, Applies to All Modes)

Before any RED step in any mode (Driven, Auto-Analysis, Standalone), you MUST run the design-first pre-code checklist: read the relevant subsystem, consider the optimal interface-stable design, and decide whether to refactor existing code or add new code. The TDD cycle's REFACTOR step is the second enforcement point — every GREEN must be followed by a real consideration of whether the new code is the cleanest shape, not the most expedient one.

This discipline is the upstream defense against patch-first development. Read it once at the start of every session and again whenever you are tempted to add a new branch / wrapper / parallel module instead of refactoring:

@../shared/design-first.md

Step 0: Determine Mode

Examine the arguments to determine the operating mode:

Argument	Mode	Behavior
`@docs/.../test-cases.md`	Driven Mode	Read test cases document, implement each case via TDD
`@src/services/payment.ts` or specific code path	Auto-Analysis Mode	Analyze specified code, design cases, implement via TDD
Feature name or description (e.g., "add validation to user signup")	Standalone Mode	Classic TDD — write tests for the described change
Empty (no arguments)	Auto-Analysis Mode	Scan project for coverage gaps, design cases, implement

Driven Mode — Implementing from Test Cases Document

When a test-cases.md file is provided (generated by spec-forge:test-cases):

D.1 Read and Parse

Read the test-cases document
Extract all test cases (TC-MODULE-NNN entries)
Identify which are already implemented (check existing test files for matching test names/IDs)
Filter to unimplemented cases
Sort by priority: P0 first, then P1, then P2

D.2 Confirm Scope

Present to user:

"{N} test cases found, {X} already implemented, {Y} remaining"
"Implement: (A) all remaining, (B) P0 only, (C) P0 + P1, (D) specific modules?"

D.3 Implement Loop

For each test case in scope:

Read the case — extract preconditions, steps, expected result, not-expected, test infra
Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers or test database; if "Mock external", set up mock for the specified third-party service; if "Temp dir", create temp directory; if "N/A", no special setup needed
RED — Write a failing test that matches the case specification
- Test name should include TC ID: test("TC-AUTH-001: create user with valid email returns 201", ...)
- Preconditions become test setup (seed data, auth context, config)
- Steps become test actions
- Expected result becomes assertions
- "Not Expected" becomes negative assertions where applicable
VERIFY RED — Run the test, confirm it fails correctly
GREEN — Write minimal production code to make it pass (if the code already exists and passes, the case was already covered — note and move on)
VERIFY GREEN — Run all tests, confirm clean pass
REFACTOR — Clean up if needed
Report — "TC-AUTH-001: DONE (test passes, implementation complete)"

D.4 Progress Tracking

After each case, display progress:

TDD Progress: {completed}/{total} ({percentage}%)
  [x] TC-AUTH-001: Create user with valid email (P0) — DONE
  [x] TC-AUTH-010: Create user with duplicate email rejected (P0) — DONE
  [ ] TC-AUTH-011: Create user with invalid email format (P1) — next
  [ ] TC-AUTH-030: Create user should NOT bypass email validation (P1)

Ask: "Continue with next case, skip, or pause?"

D.5 Completion

After all cases are implemented:

Run full test suite
Report: total cases implemented, all tests passing, coverage change
Suggest: "Run /code-forge:verify to confirm completion"

Auto-Analysis Mode — Scan and Test

When the user points to code or says "help me write tests" without a test-cases document.

Iron Rule: Auto-Analysis uses the SAME full analysis as spec-forge:test-cases. The only difference is the output — auto-analysis produces code directly instead of a document. The analysis quality must be identical.

A.0 Full Test Case Analysis (same as spec-forge:test-cases Steps 1-5)

Execute the complete spec-forge:test-cases analysis pipeline. The full workflow is defined in the spec-forge test-cases-generation skill (spec-forge/skills/test-cases-generation/SKILL.md). The essential steps are inlined below — follow them exactly:

Step 1 — Determine Input Mode and Project Profile

Determine input mode: Scan / Code / Spec (from user arguments)
Detect project profile: Web API / CLI Tool / Frontend App / AI Agent / Data Pipeline / Function Library / SDK
Detect: has database? has auth? has external APIs?
Output explicit profile with rationale

Step 2 — Deep Scan and Extract (Four Layers)

Use the language-specific deep extraction strategy (Python / TypeScript / Go / Rust / Java)
Extract ALL testable units across four layers:
- Interface: public API surface, type contracts, trait/interface boundaries
- Logic: branch paths, error chains, state transitions, validation rules
- Architecture: module structure, layer boundaries, dependency direction
- Relationships: call graphs, data flow, event propagation, trait implementations
Scan existing tests to determine current coverage
Run scan verification (file coverage ≥ 90%, module tree completeness, re-export tracking)
Produce structured Functional Inventory with all four layers per unit

Step 3 — Detect Dimensions

Apply built-in dimensions: Coverage Depth (L1/L2/L3)
Auto-detect project-specific dimensions (Auth Context, Trigger Mode, Input Source, etc.)

Step 4 — Confirm Scope with User

Present Profile confirmation: "I detected this as {profile} ({rationale}). Correct?"
Present scope: "{N} testable units, {X} have tests, {Y} don't. Cover: all / uncovered / specific modules?"
Present detected dimensions for confirmation
Ask for business rules the code can't reveal

Step 5 — Design Test Cases

Per testable unit, generate at minimum:
- 1 × L1 (Happy Path)
- 2 × L2 (Boundary + Error)
- 1 × L3 (Negative — what should NOT happen)
For interacting units: pairwise combination cases (L1 both succeed + L2 one fails + L3 should not combine)
For auto-detected dimensions: cross with coverage depth using risk-based prioritization
Apply conditional sections:
- Data Integrity cases (only if project has database)
- Security cases (only if project has auth or handles user input)
- Performance cases (only if project has latency/throughput requirements)
Assign priorities: P0 (critical path) / P1 (important) / P2 (nice-to-have)
Build coverage matrix internally: unit × depth, dimension coverage, combination coverage, gap analysis

Result: A complete set of structured test cases in memory — identical quality to what spec-forge:test-cases would produce as a document.

A.1 Optional: Save Test Cases Document

Ask the user: "Save the test cases as docs/{feature}/test-cases.md for future reference? (Y/n)"

If yes → write the document following the spec-forge:test-cases template, then continue to A.2
If no → keep in memory, continue to A.2

A.2 Implement via TDD

For each test case (sorted by priority: P0 → P1 → P2), follow the same TDD cycle as Driven Mode:

Read the case — extract preconditions, steps, expected result, not-expected, test infra
Set up test infrastructure — if Test Infra is "Real DB", configure TestContainers; if "Mock external", set up mock; if "Temp dir", create temp directory; if "N/A", no setup
RED — Write a failing test matching the case specification
- Test name should include TC ID: test("TC-AUTH-001: create user with valid email returns 201", ...)
- Preconditions → test setup; Steps → test actions; Expected result → assertions; Not Expected → negative assertions
VERIFY RED — Run the test, confirm it fails correctly
GREEN — Write minimal production code to make it pass
VERIFY GREEN — Run all tests, confirm clean pass
REFACTOR — Clean up if needed
Report — "TC-AUTH-001: DONE"

A.3 Progress Tracking

After each case, display progress (same format as Driven Mode D.4):

TDD Progress: {completed}/{total} ({percentage}%)
  [x] TC-AUTH-001: Create user with valid email (P0) — DONE
  [x] TC-AUTH-010: Duplicate email rejected (P0) — DONE
  [ ] TC-AUTH-011: Invalid email format (P1) — next

Ask: "Continue with next case, skip, or pause?"

A.4 Completion

After all cases are implemented:

Run full test suite
Report: total cases implemented, all tests passing, coverage statistics
If test cases were saved to file (A.1), report the file path
Suggest: "Run /code-forge:verify to confirm completion"

Standalone Mode — Classic TDD

For ad-hoc changes where the user describes what to build or fix:

Workflow

RED (write failing test) → VERIFY RED → GREEN (minimal code) → VERIFY GREEN → REFACTOR → REPEAT

The Cycle

Complete each phase fully before moving to the next.

1. RED — Write a Failing Test

One minimal test showing the desired behavior
Clear, descriptive test name
Use real code, not mocks (unless unavoidable: external APIs, time-dependent behavior)
One behavior per test

2. VERIFY RED — Watch It Fail (MANDATORY)

Run the test. Confirm:

It fails (not errors)
The failure message describes the missing behavior
It fails because the feature is missing, not because of typos or setup issues

If the test passes: you're testing existing behavior. Rewrite the test. If the test errors: fix the error, re-run until it fails correctly.

3. GREEN — Write Minimal Code

Simplest code that makes the test pass
No extra features, no "while I'm here" improvements
No premature abstractions — three similar lines beats a premature helper

4. VERIFY GREEN — Watch It Pass (MANDATORY)

Run the test. Confirm:

The new test passes
All other tests still pass
Output is clean (no warnings, no errors)

If the new test fails: fix the code, not the test. If other tests fail: fix them now, before proceeding.

5. REFACTOR — Clean Up (After Green Only)

Remove duplication, improve names, extract helpers
Keep all tests green throughout
Do NOT add new behavior during refactor
Apply design-first here. Look at the GREEN code in the context of the surrounding subsystem. Did GREEN push you toward a patch (new branch, new wrapper, parallel path) when a small refactor of the existing structure would have been cleaner? If so, refactor now while the test is green. The REFACTOR step is not optional — it is the moment design-first is enforced inside the TDD cycle. See @../shared/design-first.md for the discipline.

6. REPEAT

Go back to Step 1 for the next behavior.

Decision Rules

If you're about to...	Instead...	Why
Write production code without a test	STOP — write the failing test first	Tests written after implementation pass immediately and prove nothing
Skip testing because the change is "simple"	Write the test — it will be quick if it's truly simple	Simple code has the sneakiest bugs (off-by-one, null edge cases)
Apply a quick fix without a regression test	Write the test, then fix	Untested fixes become permanent regressions
Continue with code that wasn't test-driven	Consider rewriting test-first	Sunk cost — untested code is a liability regardless of time spent

External Dependency Rules

Principle: test your own dependencies for real; only mock what you don't control.

Your Dependency	Approach
Own database	Real DB (TestContainers, test instance, SQLite in-memory)
Own file system	Real temp directory
Own cache / message queue	Real (TestContainers, embedded)
External third-party API	Mock / stub acceptable
Non-deterministic input (time, random)	Inject controlled values

For projects without a database or external I/O: most tests are pure unit tests — no special infra needed
For write operations: verify state after the operation (DB query / file check / store assertion)

Example

Task: Add isPalindrome(str) function

1. RED — Write test:
   test("isPalindrome returns true for 'racecar'", () => {
     expect(isPalindrome("racecar")).toBe(true);
   });

2. VERIFY RED — Run: npm test
   ✗ ReferenceError: isPalindrome is not defined    ← fails correctly

3. GREEN — Minimal code:
   function isPalindrome(str) {
     return str === str.split("").reverse().join("");
   }

4. VERIFY GREEN — Run: npm test
   ✓ isPalindrome returns true for 'racecar'        ← passes
   42 passed, 0 failed

5. REFACTOR — (no changes needed)

6. REPEAT — next test: edge case with empty string

Test runner detection: Check package.json scripts, pytest.ini, Cargo.toml, go.mod, or Makefile for the project's test command before starting the cycle. Use the same runner consistently.

Verification Checklist

Before claiming work is complete:

Every new function/method has at least one test
Watched each test fail before implementing
Each test failed for the expected reason (not errors)
Wrote minimal code per test (no gold-plating)
All tests pass with clean output
Edge cases and error paths covered
Mocks used only when unavoidable
Database-touching tests use real database

When Stuck

Test too complicated to write → design is too complicated, simplify first
Must mock everything → code is too coupled, extract interfaces
Test setup is huge → extract test helpers or fixtures
No test-cases document and unsure what to test → run /spec-forge:test-cases first to generate a structured case set

ナビゲーション

Skillsとは？

リンク

tdd