Kairos: Convention Over Configuration for an AI Coding Agent

Most spec-driven tools for AI coding agents assume a greenfield. You start a new project, you write the specs first, the agent builds from them. That is a clean story. It is also not my story, and probably not yours either.

My reality is the opposite. I have code. I have conventions. I have tests, a compose.yml, a branching habit, and a backlog that lives mostly in my head. The cost of adopting a heavy method on a live project is not the method itself — it is the migration. Bending an existing repository to fit a tool is work, and that work has a negative return when you ship features every week.

So I built Kairos: a small set of Claude Code slash commands that turn an idea into a PRD, slice it into stories, implement them, and ship — without leaving the editor. It is open source, MIT licensed, and this article is the honest version of how it works. Including the parts that did not work the first time.

Where it comes from

Two projects mapped this space before me, and both are worth your time.

BMAD-METHOD — a richer agentic method, strongest on greenfield. It showed me how far you can push agent roles and structured handoffs.
OpenSpec — rigorous spec-driven development, also greenfield-leaning. It convinced me that a written spec is the right contract between a human and an agent.

I borrowed the idea of a spec as the contract, and dropped the greenfield assumption. Kairos’s niche is narrow on purpose: existing projects, a QA layer before a human looks at the code, and staying small. Four core commands do most of the work — /create-prd → /create-story → /implement-story → /close-story — with a few helpers around them.

This is not a finished product. It is a setup I use daily and keep hardening. Let me walk you through the ideas I find smart, and then the ones that bit me.

One file as the source of truth

Everything Kairos knows about your project lives in a single markdown file: spec.md. There is no .kairos/ folder, no cache, no hidden state. If a fact is not in spec.md, Kairos does not know it.

The file is plain markdown with a simple marker syntax — no YAML front-matter, no parser:

- **project_name**: acme-saas
- **git_host**: github
- **default_branch**: main
- **push_mode**: manual
- **worktree_mode**: off

I chose plain markdown for two reasons. First, it stays grep-friendly, so a command can read one field with a one-line shell snippet instead of a parsing step. Second, a human can edit it without a tool. You read it, you change a value, you move on.

Commands never hardcode a path or a host. They reference spec fields with a placeholder, like {spec.default_branch} or {spec.project_management_dir}, and the runtime resolves them before any tool call. This is the part I find smart, and I did not invent it — it is the lesson every templating system teaches. The same /close-story command serves a GitHub mono-repo and a GitLab multi-repo, because the host, the paths, and the push policy are read from the spec at runtime, not baked into the command.

There are actually two specs, and the split matters:

Spec	Where	Holds
Root	`./spec.md`	Services and their paths, VCS host, default branch, push mode, worktree mode, PM directory
Per-service	`./{service}/spec.md`	Language, test command, review command, and observable behavior — endpoints, events, database tables

The per-service spec is the interesting one. It does not describe how the service is built. It describes what the service does, from the outside: which routes it exposes, which tables it writes, which environment variables it needs. That outside view is exactly what an agent needs to write a test, and exactly what it should not have to guess. The full reference is in docs/spec-format.md.

Convention over configuration

If you wrote Ruby on Rails fifteen years ago, this section will feel familiar. Rails won me over with a simple promise: sensible defaults, and you only configure what you want to change. Kairos copies that idea.

When you run /init in an existing project, it detects what it can and falls back to conservative defaults for the rest:

push_mode: manual — print the git push line and wait, rather than push for you.
worktree_mode: off — work in the current tree, no branching magic.
project_management_dir: project-management — where PRDs, stories, and the archive live.

Every default is overridable. Do not like project-management/? Point project_management_dir at the folder you already use. The convention gives you a working setup in minutes; the configuration is there when your project disagrees.

The part I am most careful about is /init itself. It is read-only detection. It scans your repo — services, test commands, VCS, branch, whether a CHANGELOG.md exists — and writes the spec.md you would have written by hand. It never edits package.json, your compose.yml, your .gitignore, or anything else. On a re-run it diffs its detection against your existing spec and asks field by field before changing a value. This is what makes adoption a five-minute step instead of a migration. You can read commands/init.md to see how conservative it is — the cardinal rule at the top is “write only spec files.”

A QA layer between unit tests and humans

I wrote a whole article about using Claude Code as a small QA team, so I will keep this short. The short version: unit tests pass and are still not enough. They verify isolated functions with mocked dependencies. They do not catch the bug where a function produces duplicates because the deduplication logic depends on a database state that no fixture replicates.

Kairos puts that idea into two commands.

/create-test-plan turns a free-form prompt (“smoke test for the import pipeline”) into a runnable test plan. The key constraint: it reads the per-service spec and only references real artifacts. If it writes SELECT ... FROM invoices, the invoices table must exist in that service’s spec. It does not invent endpoints or table names. Every step ends with an observable checkbox — HTTP 200, 0 rows returned, status = completed — never a vague “verify it works.”

/qa executes the plan. Two design choices make it reliable:

The plan is the contract, not a dry-run. The command runs every step as written, in order, including the mutating ones. It treats the run as a release rehearsal. It never silently skips a step.
No hardcoded infrastructure. The command never bakes in a docker exec <container> or a connection string. It derives the execution context — the HTTP base URL, the SQL client — from the service’s spec. If the spec does not say how to reach the database, it asks once and reuses the answer for the whole run.

A closing “Critical points” table in each plan names which phases are gating. A failing checkbox in a gating phase stops the run; a failure elsewhere is recorded but execution continues. And /close-story calls /qa automatically for any impacted service that has a plan — so the QA layer runs before a human ever looks at the diff.

The hard part: testing inside a worktree

Now the part that took me three tries.

For a multi-story epic, I wanted every story to share one git worktree and one branch, so Claude Code’s context — including its memory — persists across the whole epic. A git worktree is a separate working directory attached to the same repository. It sounds simple. Then you try to run the tests inside it, and two things break in a way that is easy to miss.

Problem one: a worktree only carries committed content. git worktree add materializes what is committed. Your .env file is gitignored, so it is not there. Any container or test that needs it fails — or worse, runs against the wrong defaults. My first fix was a per-service spec field, worktree_seed_files: list the gitignored runtime files, and the command copies them from the main checkout into the fresh worktree at creation time.

Problem two — the subtle one. My test command was docker exec api pytest. That attaches to a long-running container. But that container was started from the prod checkout, not from the worktree. So the tests ran against the original code, reported a confident “pass,” and told me nothing about the worktree. Worse, a mutating test could touch the prod container’s state.

The fix was a second field, worktree_test_command, that runs the suite in an isolated, ephemeral container instead:

cd {worktree}/api && CONTAINER_ENV_PREFIX={worktree_id}- \
  docker compose -p {worktree_id} run --rm --build api pytest tests/ -v

run --rm publishes no ports, so there is no clash with the live service. -p {worktree_id} gives the Compose project its own namespace. That was version one of worktree isolation. I shipped it, and then I found the holes.

What broke, and the three hardening passes

Pass one — the env var did not reach the image. For the isolated container to get its own image name, the built image: and container_name: in the Compose file have to be namespaced. I tried prefixing them with ${CONTAINER_ENV_PREFIX}. The trick that took me a while: you have to pass that variable on the shell, not through the Compose env_file. An env_file does not feed ${...} interpolation in the Compose file itself. Once the prefix is in place, it is safe by construction: when CONTAINER_ENV_PREFIX is unset — the prod default — ${CONTAINER_ENV_PREFIX} collapses to an empty string, and behavior is identical to before.

But who rewrites the Compose file, and when? My instinct was to have the command edit it on the fly inside the worktree. That is wrong: an uncommitted infra change inside a worktree is scope creep, and the prefix would still be missing from prod. So I split it out into a dedicated command, /setup-worktree-isolation. It runs once, on the main branch, is idempotent, shows you the diff, and hands you the commit. The prefix has to live in committed content so every future worktree inherits it. Detection and mutation are now separate, explicit, reviewable steps.

Pass two — the silent fallback was a trap. And it was a trap I had left for myself. worktree_test_command was opt-in. If a service declared a docker exec test command but no isolated variant, the gate silently fell back to running it — against prod. A meaningless pass, with a chance of mutating the live system. The fix was to turn the trap into a gate: in a worktree, if the test command attaches to a fixed container and no worktree_test_command is declared, the command stops and reports BLOCKED instead of running it. A loud stop beats a silent wrong answer. The whole worktree story is, in hindsight, a long lesson in that one sentence.

Pass three — the images piled up. Each isolated test build produced an image named with the {worktree_id}- prefix. They accumulated, one per epic. So teardown now prunes the isolated Compose project and removes only the images carrying that prefix. Prod images are never matched — the prefix is exactly what makes the cleanup safe.

Three commits, all titled some variation of “harden worktree-isolated testing.” None of this was in my head when I started. It came out of running the thing and watching it do the wrong sensible-looking thing.

One small trick that makes the shared worktree pay off

There is a detail I am quietly proud of. Claude Code keeps its per-project memory and history in a folder keyed by the project’s path — the slashes in the path are turned into dashes to make a folder name. A worktree lives at a different path (../prefix-epic-slug), so by default Claude Code treats it as a brand-new project, with no memory at all. The whole point of sharing one worktree across an epic was to keep context alive; losing the memory would defeat that.

The fix is a single symbolic link. When the command creates the worktree, it points the worktree’s project folder at the main project’s:

ln -sfn "$CLAUDE_PROJECTS/-$MAIN_PROJECT" "$CLAUDE_PROJECTS/-$WT_PROJECT"

Now the memory and the conversation history follow you into the worktree. It is one line, and it is the small thing that makes the “context persists across the whole epic” promise actually true rather than just a nice sentence. If the main project has no memory yet — a first run — the link is skipped and nothing breaks.

Tying it together: `/implement-epic`

All of the above comes together in one orchestrator, /implement-epic. It runs a sequence of stories that share an epic, through a single shared worktree. For each story it spawns one fresh subagent that implements the story and then closes it — tests, QA, review, commit — before moving to the next. The subagent works in an isolated context, so the orchestrator’s own context grows only by the short structured report each subagent returns. Push, PR, and worktree teardown are deferred until the epic’s last story closes.

The rule I care about most here: safety gates are sacred. A subagent auto-approves routine prompts — the implementation plan, the commit — because it cannot ask the user. But it never works around a blocking gate. A failing test, a critical review finding, scope creep, an unmet dependency, an ambiguous story — any of those makes the subagent return BLOCKED without committing, and the whole run stops. The agent is allowed to be autonomous on the boring decisions and forbidden from being clever on the dangerous ones.

What is weak, and what bit me

I would rather be honest about the soft spots than sell you a clean story.

The single source of truth has a maintenance cost. Putting everything in spec.md is great until the per-service specs drift. /close-story appends a line to a service’s spec every time it touches it, and those appends inflate the file over time. I had to write a separate command, /spec, just to compact a spec back under a line budget and to backfill one from code when it is thin. So the “one file, no hidden state” promise is real, but it is not free — the file needs grooming.

Silent failures are the enemy, and I shipped one. The command that archives a closed story used a glob, STORY-{NNN}-*.md, that required a slug suffix. A story file named STORY-007.md, with no suffix, was silently not moved. I only noticed because stories were not landing in the archive. The fix resolves the file across both name forms and asserts a single match — but the lesson is the same as the worktree one: a command that fails quietly is worse than one that stops and complains.

“It worked on my machine.” My first marketplace manifest pointed at a local absolute path on my disk. It installed fine for me and was uninstallable for anyone else. Embarrassing, and exactly the kind of thing you only catch when someone else tries it. It now points at the GitHub repo.

It is markdown telling an AI what to do. None of this is elegant. The commands are long markdown files full of rules, gates, and “stop and ask” instructions. The orchestration is model-driven, which means non-determinism: the same step can occasionally be read slightly differently. The structured prompts and explicit return formats reduce that, but do not erase it. And every run costs tokens. For a solo developer shipping fast, the trade is worth it. At a larger scale, with a real QA team, you would want something firmer.

It is open — I would love your feedback

I use Kairos with my clients and on AFK, a side project of mine. It is MIT licensed and lives on GitHub. Installing it is two slash commands and an /init:

/plugin marketplace add sylvain-artois/kairos-claude-code-workflow
/plugin install kairos@kairos
/init

What I am genuinely unsure about: whether the spec-as-source-of-truth approach scales past a handful of services without the grooming cost becoming annoying, and whether the worktree isolation is solid on Compose setups very different from mine. If you try it on your own project — especially an existing one with its own conventions — I would love to hear what broke. That is the part of this work that gets better only with the mistakes other people hit, not just my own.