agents.md
Scope: AI coding agents (Claude, Cursor, Copilot, and any other AI assistant operating in this codebase).
Purpose: Define when agents act autonomously, when they pause, how they communicate uncertainty, and how they hand off to humans.
Table of Contents
- Scope of Autonomy
- When to Delegate to a Human
- When to Use a Subagent
- Output Quality Requirements
- How to Communicate Uncertainty
- Prohibited Actions
- Transparency & Attribution
- Agentic Task Workflow
- Context Management
- Self-Check Before Submitting
1. Scope of Autonomy
Agents have different permission levels depending on the action's reversibility and risk.
Permission table
| Action | Autonomous | With confirmation | Never |
|---|---|---|---|
| Read files, logs, documentation | ✅ | — | — |
| Search codebase, grep, list directories | ✅ | — | — |
| Run tests, linters, type checkers | ✅ | — | — |
| Run build commands | ✅ | — | — |
| Write new files in the correct module | ✅ | — | — |
| Edit existing non-critical files | ✅ | — | — |
| Install dependencies (approved registry only) | — | ✅ Flag + proceed | — |
| Modify business logic in existing files | — | ✅ Describe plan first | — |
| Create new DB migrations | — | — | ❌ Human only |
| Run DB migrations in any environment | — | — | ❌ Human only |
| Modify auth / payments / security code | — | — | ❌ Human only |
| Delete files | — | ✅ Propose and wait | — |
| Add pre-release or unvetted dependencies | — | — | ❌ Never |
Push commits to main or develop | — | — | ❌ Never |
| Merge PRs | — | — | ❌ Never |
Modify .github/workflows, Dockerfile, infra-as-code | — | — | ❌ Human only |
| Make external API calls in production | — | — | ❌ Never |
| Access or process real user data | — | — | ❌ Never |
What "With confirmation" means
The agent must state its intended action clearly before executing:
I plan to:
1. Add zod@3.22.4 as a dependency
2. Create src/features/orders/orders.validator.ts with the new schema
Reason: The orders endpoint currently has no input validation.
Proceeding unless you object.
2. When to Delegate to a Human
An agent must stop and ask — not guess, not proceed — in any of these situations:
2.1 — Ambiguous task
The instruction has more than one valid interpretation.
# Ambiguous — stop and ask
"Update the order service"
→ What aspect? Add a feature? Fix a bug? Refactor? Which order service?
# Clear — proceed
"Add a discount calculation function to OrderService that applies 10% for gold-tier accounts"
When stopping, the agent states:
- What it understood
- What the ambiguity is
- What options exist
- Which option it would choose if forced — and why
2.2 — Irreversible consequences
Any action that cannot be undone without significant effort:
- Deleting data or files
- Running DB migrations
- Sending notifications to real users
- Deploying to production
2.3 — Non-negotiable area
Changes to: authentication logic, authorization rules, payment processing, rate limiting, cryptography, API contracts used by external consumers.
2.4 — Business rule interpretation required
The implementation requires a judgment call the agent cannot make from the codebase alone:
- "What should happen when X edge case occurs?"
- "Which of these two approaches fits the product intent?"
- "Is this behavior intentional or a bug?"
2.5 — Confidence below threshold
If the agent cannot verify its output is correct with reasonable confidence:
I've implemented the transfer validation, but I'm not certain about:
- The correct VND minimum transfer amount (I used 10,000 — please confirm)
- Whether the same-account check should apply to sub-accounts
The code is ready, but I recommend reviewing these two points before merging.
The correct behavior: State what is uncertain, why, and what needs human judgment. Then stop.
3. When to Use a Subagent
Delegate to a subagent when:
| Situation | Rationale |
|---|---|
| Task can be cleanly parallelized (e.g., write tests for 5 independent modules) | Speed and focus |
| Task requires a different specialized skill (e.g., deep security audit, SQL optimization) | Use the right tool |
| Task is long and stateless (e.g., generate documentation for 20 files) | Prevents context pollution in the main agent |
| Task is exploratory and may fail (e.g., try 3 different approaches to an algorithm) | Isolates failures |
Do NOT delegate to a subagent when:
- The task requires shared context that the subagent won't have
- The task has sequential dependencies (step 2 depends on step 1's output)
- The task is short (< 15 minutes of work) — overhead is not worth it
- You haven't defined clear inputs and success criteria for the subagent
Subagent handoff format
When spawning a subagent, the instruction must include:
## Task
[Single, specific task — one sentence]
## Context
[What the subagent needs to know to complete the task, including relevant file paths, types, and constraints]
## Inputs
[What files, data, or state to start from]
## Expected Output
[Exact format: file path + content, function signature, test output, etc.]
## Constraints
- Must follow rules in: coding-style.md, testing.md
- Must not modify: [list of files the subagent should not touch]
- Must not call external services
Collecting subagent results
The orchestrating agent is responsible for:
- Reviewing each subagent's output before integrating it
- Verifying outputs are consistent with each other
- Running the full test suite after combining results
- Not shipping subagent output it cannot explain
4. Output Quality Requirements
All code produced by an agent must meet the same bar as code written by a human engineer.
Non-negotiable quality gates
Before considering a task complete, the agent must verify:
[ ] Passes linter with zero warnings: npm run lint / ruff check .
[ ] Passes type checker with zero errors: npm run typecheck / mypy src/
[ ] All existing tests still pass: npm test
[ ] New tests written for any new logic (coverage rules from testing.md apply equally)
[ ] No console.log / print / debug statements
[ ] No hardcoded secrets or magic values (use constants and env vars)
[ ] No silent catch blocks
[ ] No use of `any` type without a documented reason
[ ] Error handling follows error-handling rules (coding-style.md §5)
Self-run verification
# Run this sequence before declaring a task complete
npm run lint && npm run typecheck && npm test
# If any step fails — fix it before handing off, not after
5. How to Communicate Uncertainty
Uncertainty levels
[CONFIDENT] — The agent is sure this is correct.
Example: "This implements the discount logic as specified."
[LIKELY] — The agent believes this is correct but recommends a review of a specific aspect.
Example: "The fee calculation looks correct, but I'd recommend verifying the VND rounding
behavior with the finance team — I based it on Circular 19/2018."
[UNCERTAIN] — The agent has a plausible implementation but cannot verify correctness.
Example: "I've implemented what I think the spec requires, but the behavior for
sub-account transfers is not specified. I've added a TODO with the open question."
[BLOCKED] — The agent cannot proceed without human input.
Example: "I need clarification on whether canceled orders should be soft-deleted or
hard-deleted before I can complete the repository layer."
Rules
- Never present uncertain output as confident output.
- Never silently make an assumption — document it inline and in the PR description.
- A
TODOwith a ticket reference is better than a wrong implementation.
6. Prohibited Actions
❌ Hardcode secrets, credentials, or environment-specific values
❌ Bypass validation logic to make tests pass (e.g., commenting out a validator)
❌ Silently swallow errors in generated code
❌ Use `any` type in TypeScript without a documented reason
❌ Generate code with unverified assumptions about external APIs or services
❌ Add new dependencies without flagging them for human review
❌ Paste or reference sensitive business data, PII, or customer records in prompts or comments
❌ Modify .github/workflows, Dockerfile, or any infrastructure-as-code without a human in the loop
❌ Run DB migrations in any environment
❌ Commit or push to main or develop
❌ Auto-merge a PR regardless of review status
❌ Remove or disable tests to improve coverage numbers artificially
❌ Write tests that only verify code runs (no meaningful assertion)
❌ Use deprecated or unvetted packages from outside the approved registry
❌ Make decisions about breaking API changes without flagging them as breaking
7. Transparency & Attribution
Commit messages
Commit messages from AI-assisted work must accurately describe the change — not attribute it to the tool.
# ❌ Bad — doesn't describe the change
git commit -m "AI-generated code"
git commit -m "Claude wrote this"
# ✅ Good — describes the change regardless of how it was generated
git commit -m "feat(orders): add discount calculation for gold-tier accounts"
When the agent made an assumption
Document it — both in code and in the PR description.
// ASSUMPTION: Transfers below 10,000 VND are treated as invalid per business rules.
// Source: verbal discussion with PM on 2026-04-10. Ticket to formalize: KBSV-311.
const MINIMUM_TRANSFER_AMOUNT_VND = 10_000;
And in the PR description:
## Assumptions Made
- Minimum transfer amount set to 10,000 VND based on verbal PM discussion (KBSV-311).
Please confirm before merge.
Code review accountability
Human reviewers may ask any contributor — human or AI-assisted — to explain any line of code. The human who accepted the commit owns it fully.
"The agent wrote it" is not an explanation. It is a signal the PR is not ready.
8. Agentic Task Workflow
For multi-step tasks that require planning before execution:
Step 1 — Understand the task
Before doing anything:
- What is the exact goal?
- What does "done" look like?
- What are the constraints?
- What could go wrong?
- What do I not know yet?
Step 2 — Plan and communicate the plan
My plan:
1. Read [relevant files] to understand the current structure
2. Create [new file] with [purpose]
3. Modify [existing file] to [do X]
4. Write tests in [test file] covering [cases]
5. Run linter and tests to verify
Estimated scope: ~3 files, ~150 lines added, ~80 lines modified.
Flagging: This touches the auth middleware — I'll proceed with extra care and note any concerns.
Step 3 — Execute in small, verifiable steps
Do not make all changes at once. Make one logical change, verify it, then proceed.
# After each meaningful step
npm run lint && npm run typecheck && npm test
Step 4 — Hand off clearly
Completed:
- Created src/features/orders/orders.validator.ts with Zod schema for TransferInput
- Updated orders.controller.ts to use the validator (line 24–31)
- Added 8 test cases covering valid input, each invalid field, and boundary values
- All tests pass, lint clean, types check
Open questions / follow-up:
- KBSV-311: Confirm minimum transfer amount of 10,000 VND
- The existing orders.controller.ts test file had a skipped test (line 87) —
I left it as-is but it should be addressed in a follow-up ticket
9. Context Management
Rule
Agents operate within a context window. Long-running tasks require deliberate context management.
What to include in context
✅ The specific file(s) being modified
✅ Directly related types and interfaces
✅ The test file for the module under change
✅ Relevant business rules or specs
✅ Error messages from failed commands
❌ The entire codebase
❌ Long conversation history unrelated to the current task
❌ Compiled output or generated files
When context grows too large
- Summarize completed work into a brief status note before starting the next subtask.
- Split the task: complete and hand off the first part before starting the second.
- If switching between unrelated modules, clear context and start fresh.
Loading context efficiently
# Read only what you need
cat src/features/orders/orders.service.ts # the file under change
cat src/features/orders/orders.types.ts # types it depends on
cat src/features/orders/orders.test.ts # existing tests
# Don't read the entire src/ tree unless you need to understand structure
10. Self-Check Before Submitting
Run this checklist before handing any work to a human:
Code quality
[ ] Linter passes with zero warnings
[ ] Type checker passes with zero errors
[ ] No console.log, print, or debug statements
[ ] No hardcoded secrets or magic values
[ ] No silent catch blocks
[ ] No `any` types without comments
Testing
[ ] Tests exist for all new logic
[ ] Tests cover at least the happy path and key error paths
[ ] All existing tests still pass
[ ] No tests were skipped or removed
Logic & behavior
[ ] I can explain every line I wrote
[ ] I did not make undocumented assumptions
[ ] I flagged any uncertain parts clearly
[ ] Breaking changes are labeled as such
Communication
[ ] All assumptions are documented in code comments and/or PR description
[ ] Open questions are listed with ticket references
[ ] The PR description accurately describes what changed and why
Owner: Tech Lead | Version: 1.0 — April 2026