/test-plan
Use when preparing a PR for QA review. Generates manual testing checklist from git diff. Covers test plan, QA checklist, testing before merge. NOT for: automated tests (write those separately), code reviews (use coderabbit).
$ golems-cli skills install test-planUpdated 2 weeks ago
Analyze changes in the current Git branch and generate a manual testing checklist organized by page/feature.
Quick Start
The skill auto-runs on load. Override the base branch:
./scripts/generate.sh --base main
./scripts/generate.sh --base dev
./scripts/generate.sh --base origin/stagingWhat It Does
- Gets the diff against the base branch (default: main)
- Categorizes changed files by type (UI, API, DB, Config, etc.)
- Generates a Markdown checklist grouped by feature/component
- Includes regression test suggestions for related areas
Output Format
Full SKILL.md source — includes LLM directives, anti-patterns, and technical instructions stripped from the Overview tab.
Analyze changes in the current Git branch and generate a manual testing checklist organized by page/feature.
Quick Start
The skill auto-runs on load. Override the base branch:
./scripts/generate.sh --base main
./scripts/generate.sh --base dev
./scripts/generate.sh --base origin/stagingWhat It Does
- Gets the diff against the base branch (default: main)
- Categorizes changed files by type (UI, API, DB, Config, etc.)
- Generates a Markdown checklist grouped by feature/component
- Includes regression test suggestions for related areas
Output Format
## Test Plan
### [Feature/Component Name]
- [ ] Test: Description of what to verify
- [ ] Test: Another thing to check
### API Changes
- [ ] Test: Verify endpoint returns expected shape
- [ ] Test: Error responses have correct status codes
### Database/Schema
- [ ] Test: Verify migrations run cleanly
- [ ] Test: Data integrity after changes
### Configuration
- [ ] Test: Verify env vars are documented
- [ ] Test: Config changes don't break existing deploys
### General
- [ ] No console errors during testing
- [ ] No TypeScript/build errors
- [ ] Mobile responsive (if UI changes)Guidelines
- Be specific: "Verify user can submit form" not "Test form"
- Include edge cases: Empty states, error states, loading states
- Consider permissions: Test as different user roles if auth-related
- Note regressions: If touching shared code, note areas that could regress
- Prioritize: Put most critical tests first within each section
Usage
Run this skill before creating a PR to generate the test plan section for your PR description.
Best Pass Rate
100%
Opus 4.6
Assertions
13
3 models tested
Avg Cost / Run
$0.2263
across models
Fastest (p50)
1.9s
Haiku 4.5
Behavior Evals
Phase 2 baseline — skill quality on ClaudeBehavior Baseline
| Assertion | Opus 4.6 | Sonnet 4.6 | Haiku 4.5 | Consensus |
|---|---|---|---|---|
| generates-from-git-diff | 2/3 | |||
| grouped-by-feature-component | 3/3 | |||
| checkable-markdown-format | 3/3 | |||
| includes-edge-cases | 2/3 | |||
| includes-regression-suggestions | 2/3 | |||
| specific-not-generic | 2/3 | |||
| runs-diff-despite-user-description | 1/3 | |||
| executes-generate-script | 2/3 | |||
| may-find-additional-changes | 2/3 | |||
| respects-base-branch-override | 3/3 | |||
| categorizes-all-file-types | 3/3 | |||
| prioritizes-critical-tests | 2/3 | |||
| includes-setup-steps | 3/3 |
Token Usage
Cost per Run
| Model | Input Tokens | Output Tokens | Cost / Run | Cost / 1K Runs |
|---|---|---|---|---|
| Opus 4.6 | 6,537 | 6,391 | $0.5774 | $577.40 |
| Sonnet 4.6 | 4,642 | 5,686 | $0.0992 | $99.20 |
| Haiku 4.5 | 1,940 | 1,375 | $0.0022 | $2.20 |
Response Time (p50)
Response Time (p95)
| Model | p50 | p95 | Overhead |
|---|---|---|---|
| Opus 4.6 | 9.1s | 13.7s | +50% |
| Sonnet 4.6 | 5.1s | 8.7s | +71% |
| Haiku 4.5 | 1.9s | 2.9s | +56% |
Last evaluated: 2026-03-12 · Data is generated from skill assertions (real cross-model benchmarks coming soon)
Changelog entries are derived from eval runs and skill version updates. Full cascading changelog (Phase 4D) coming soon.
Best Pass Rate
100%
Assertions
13
Models Tested
3
Evals Run
3
- +Initial release to Golems skill library
- +13 assertions across 3 eval scenarios