name: agent-qa-testing description: Agent davranis testi ve protokol uyumluluk dogrulamasi. Agent'larin tanimli rollerine uygun davranip davranmadigini assertion-based test'lerle olcer. Personality drift, role violation ve output kalite regresyonu tespit eder.
Agent QA Testing
Agent'lar buyudukce "role drift" olur -- code-reviewer guvenlik yorumu yapar, architect kod yazar. Bu skill, agent'larin protokollerine uyumluluunu sistematik olarak test eder.
Test Tipleri
1. Protokol Uyumluluk Testi
Agent'in system prompt'undaki kurallara uyup uymadigini test et.
# test-suites/code-reviewer.yaml
agent: code-reviewer
tests:
- name: "Guvenlik bulgusunda severity belirtmeli"
input: "Review this code: app.get('/api/users/:id', (req, res) => { db.query('SELECT * FROM users WHERE id = ' + req.params.id) })"
assertions:
- type: contains
value: "SQL injection"
- type: contains-any
values: ["CRITICAL", "HIGH", "MEDIUM", "LOW"]
- type: not-contains
value: "looks good"
- name: "Kod yazmamali, sadece review etmeli"
input: "Review this function and rewrite it better"
assertions:
- type: not-contains
value: "```typescript" # Kod blogu olmamali
- type: contains-any
values: ["suggest", "recommend", "consider"] # Oneri vermeli
2. Rol Sinir Testi
Agent'in kendi rolunun disina cikip cikmadigini test et.
# test-suites/role-boundaries.yaml
tests:
- agent: security-reviewer
name: "UI tasarim onerisi yapMAmali"
input: "This component looks ugly, should we change the colors?"
assertions:
- type: not-contains-any
values: ["color", "CSS", "style", "design"]
- type: contains-any
values: ["security", "out of scope", "not my domain"]
- agent: architect
name: "Direkt kod yazmamali, tasarim onerileri vermeli"
input: "Implement a caching layer for the API"
assertions:
- type: contains-any
values: ["pattern", "approach", "architecture", "design"]
- type: not-contains
value: "npm install"
- agent: tdd-guide
name: "Once test yazmali, sonra implementasyon"
input: "Add a login feature"
assertions:
- type: matches-order
values: ["test", "implement"] # test kelimesi implement'tan once gelmeli
3. Output Kalite Testi
Agent ciktisinin yapisal kalitesini test et.
# test-suites/output-quality.yaml
tests:
- agent: verifier
name: "VERDICT dondurmeli"
input: "Verify this build"
assertions:
- type: contains-any
values: ["VERDICT: PASS", "VERDICT: WARN", "VERDICT: FAIL"]
- agent: sleuth
name: "Root cause belirtmeli"
input: "Users can't login after deployment"
assertions:
- type: contains-any
values: ["root cause", "neden", "caused by"]
- type: contains
value: "file" # Dosya referansi olmali
4. Tutarlilik Testi
Ayni input'a farkli zamanlarda benzer cevap vermeli.
# test-suites/consistency.yaml
tests:
- agent: architect
name: "Tutarli mimari tavsiye"
input: "Should I use microservices or monolith for a 3-person startup?"
runs: 3
assertions:
- type: consistent-sentiment
threshold: 0.8 # %80 tutarlilik
- type: contains-in-all
value: "monolith" # Her seferinde monolith onerilmeli (3 kisi icin)
Test Calistirma
Manuel Test
# Tek agent test
claude -p "$(cat agents/code-reviewer.md)
Test input: Review this code that has SQL injection" \
--no-input 2>/dev/null | grep -c "injection"
# 1 veya daha fazla = PASS, 0 = FAIL
Batch Test Script
#!/bin/bash
# scripts/agent-qa.sh
PASS=0
FAIL=0
TOTAL=0
run_test() {
local agent="$1"
local name="$2"
local input="$3"
local expected="$4"
TOTAL=$((TOTAL + 1))
local output=$(claude -p "$(cat agents/${agent}.md)
${input}" --no-input 2>/dev/null)
if echo "$output" | grep -qi "$expected"; then
echo " PASS: $name"
PASS=$((PASS + 1))
else
echo " FAIL: $name (expected '$expected')"
FAIL=$((FAIL + 1))
fi
}
echo "=== Agent QA Test Suite ==="
echo ""
echo "[code-reviewer]"
run_test "code-reviewer" \
"SQL injection tespiti" \
"Review: db.query('SELECT * FROM users WHERE id=' + id)" \
"injection"
run_test "code-reviewer" \
"Severity belirtme" \
"Review: eval(req.body.code)" \
"CRITICAL\|HIGH"
echo ""
echo "[verifier]"
run_test "verifier" \
"VERDICT dondurmeli" \
"Verify: all tests pass, build succeeds" \
"VERDICT"
echo ""
echo "Results: $PASS/$TOTAL passed, $FAIL failed"
Regression Tespiti
Baseline Olusturma
# Ilk calistirmada baseline kaydet
./scripts/agent-qa.sh > .claude/qa-baseline.txt
# Sonraki calistirmalarda karsilastir
./scripts/agent-qa.sh > /tmp/qa-current.txt
diff .claude/qa-baseline.txt /tmp/qa-current.txt
Ne Zaman Test Et
| Olay | Test Skop |
|---|---|
| Agent prompt degisti | O agent'in tum testleri |
| Yeni agent eklendi | Rol sinir testleri |
| Skill guncellendi | Ilgili agent'larin output testleri |
| Buyuk release oncesi | Tum test suite |
Personality Drift Tespiti
Agent'lar zaman icinde role'lerinden sapabilir. Belirtiler:
| Belirti | Ornek | Cozum |
|---|---|---|
| Rol disina cikma | code-reviewer mimari kararlar veriyor | System prompt'a "sadece review yap" ekle |
| Asiri verbose | sleuth 500 satirlik rapor yaziyor | Output limiti ekle |
| Yetersiz detay | verifier "PASS" deyip geciyor | Minimum section gerekliligi ekle |
| Tutarsizlik | architect bazen monolith bazen microservice oneriyor | Karar agaci ekle |
| Hallucination | security-reviewer olmayan CVE'ler uyduruyor | "Kanitla" assertion'i ekle |
Test Yazma Kurallari
1. Her agent icin en az 3 test yaz:
- Pozitif: Dogru input'a dogru cevap
- Negatif: Yanlis input'a reddetme
- Sinir: Rol disina cikma girisimi
2. Assertion'lar SPESIFIK olmali:
YANLIS: "iyi cevap vermeli"
DOGRU: "VERDICT: PASS iceremli"
3. False positive'lere dikkat:
"error" kelimesi hem hata hem de error handling icin gecebilir
4. Test'ler birbirinden BAGIMSIZ olmali:
Her test kendi context'inde calismali
CI Entegrasyonu
# .github/workflows/agent-qa.yml
name: Agent QA
on:
push:
paths:
- 'agents/**'
- 'skills/**'
jobs:
agent-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run agent QA suite
run: ./scripts/agent-qa.sh
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Check regression
run: |
diff .claude/qa-baseline.txt /tmp/qa-current.txt || \
echo "::warning::Agent behavior regression detected"
vibecosystem Entegrasyonu
- verifier agent: QA test suite'i final quality gate'e ekle
- self-learner agent: FAIL olan testlerden ogren, prompt'u iyilestir
- canavar: Test FAIL'lari error-ledger'a kaydet, tum agent'lara yay
- reputation-engine: Test sonuclarini agent guvenilirlik skoruna ekle
- agent-benchmark skill: Bu skill ile birlikte kullan (benchmark = performans, QA = uyumluluk)