Your harness is where your team's accumulated knowledge lives. Own it, invest in it, keep it portable across whichever model lands next.
Seven parts
What goes into a harness
A harness is the durable layer around a model: instructions, tools, permissions, context, and verification. Claude Code and Codex are themselves harnesses. Your team provides a second one on top of them.
Context, the context graph that links it, workflow, restraint, empowerment, verification, and a visual interface.
CLAUDE.md, AGENTS.md, path-scoped rules, reusable skills, examples and recipes, your data model, and your past decisions.
Each session starts with the team's accumulated decisions already in scope, instead of being re-derived from the prompt.
02
How the context connects
Context Graph
Typed links between tracker items, plans, specs, diagrams, mockups, sessions, diffs, files, commits, and decisions.
Instead of copy-pasting six links into a prompt, the agent can follow the same chain you would.
03
The shape of a coding session
Workflow
Slash commands, plan-then-execute arcs, subagents, reusable skills, and worktrees so multiple agents can work in parallel without colliding.
A workflow layer keeps each session from reinventing itself every time it starts.
04
What your agent must not do
Restraint
Hard rules, approval boundaries, permission scopes, tool allowlists, and an audit trail.
A capable agent without restraint eventually does something expensive, destructive, or embarrassing faster than you expected.
05
What your agent can actually do
Empowerment
Tools that read logs, query the running database, drive the UI, take screenshots, and run end-to-end test loops.
An agent that can inspect the actual result can often close its own loop without a human in the middle.
06
How the agent proves a change works
Verification
Unit tests, end-to-end tests, fail-first reproductions, type checks, and a simulator for AI tool calls.
If the agent cannot show the change works end-to-end, it is not done.
07
How you and your agent share the work
Visual Interface
Markdown, mockups, diagrams, data models, red and green diffs, screenshots, and threaded discussions tied to the artifacts.
A visual workspace keeps decisions attached to artifacts instead of burying them in chat.
A worked example
A harness in action
Here is what those seven parts look like filled in for a single concrete prompt, all the way through to the resulting outcome.
The same seven-part structure, filled in with what each cell looks like for a real piece of work.
How to think about your harness
Prioritizing your harness
Own your harness
If you cannot read it, edit it, take it with you, and run it under any agent you choose, it is not yours.
Invest in your harness
Spend a meaningful share of your AI effort on better rules, tools, recorded decisions, and tighter verification loops. Treat the harness as a product your team ships to itself.
Keep it portable across models
Same files, same rules, same tools, same graph, whatever model lands next. If switching agents means rebuilding the harness, you do not really have optionality.
An example you can adopt
Nimbalyst is an open-source workspace built around these seven parts
Visual interface, context graph, workflow scaffolding, empowerment tools, verification loops, and cross-model CLAUDE.md and skills, all in one workspace. Claude Code and Codex run as first-class agents. The agent layer is pluggable for whatever lands next.
The desktop and iOS apps are MIT licensed. Study how they are wired, copy what is useful, or run Nimbalyst as your workspace.
An agent harness is the system around the AI model that helps it do real work on your project. We think about ours in seven parts: context (what the agent knows about your code and conventions), a context graph (how that knowledge connects across tracker items, plans, diagrams, sessions, and files), workflow (slash commands, plan-then-execute, subagents, skills, worktrees), restraint (rules, permissions, allowlists), empowerment (tools that touch live state), verification (tests, type checks, fail-first reproductions, AI tool simulators), and a visual interface. The model is interchangeable. The harness is where your durable investment lives.
How is a harness different from Claude Code or Codex?
Claude Code and Codex are themselves harnesses. They wrap a frontier model with a system prompt, a tool set, a permission system, and an execution loop. Your team provides a second harness on top of that: the workspace, the linked context, the workflow, the rules, the verification loop, and the tools that are specific to your project.
Why does the harness matter more than the model?
Frontier models flip the leaderboard every few weeks. Recent studies from Stanford and Tsinghua show that the orchestration code around the model drives more performance variation than the model itself: the same model can produce a six-times gap in result quality depending on the harness it runs in. Investment in your harness compounds and survives model churn. Investment in tuning prompts for last quarter's model does not.
How do I start building a harness for Claude Code and Codex?
Start with a CLAUDE.md or AGENTS.md at the root of your project that captures your real conventions and hard rules. Add path-scoped rule files for areas with special concerns. Wire up at least one tool that lets the agent verify its own work, like a test loop or a screenshot tool. Adopt a workspace like Nimbalyst that gives you a linked context graph, workflow scaffolding, and visual editors out of the box, so the agent and the human can work from the same artifacts.
What is the context graph in a harness?
The context graph records persistent, typed links between the artifacts that matter. Tracker item to plan, plan to spec, spec to diagram, diagram to session, session to diff, diff to files, decision to the work that forced it. Without it, the connections between work live only in human heads and an agent cannot traverse them. With it, both human and agent can pick up where the last session left off in a single traversal.
Is Nimbalyst the only way to build a harness?
No. Many of the pieces of a good harness, like CLAUDE.md, path-scoped rules, and tool definitions, can be built up inside any project. Nimbalyst is one open-source example of a workspace that already includes the context graph, workflow, verification, and visual interface parts. Adopt it whole, copy ideas from it, or use it as a reference while building your own.
Nimbalyst: the open-source visual workspace for building with Codex, Claude Code, and more