Harness

Invest in your harness

Your harness is where your team's accumulated knowledge lives. Own it, invest in it, keep it portable across whichever model lands next.

Seven parts

What goes into a harness

A harness is the durable layer around a model: instructions, tools, permissions, context, and verification. Claude Code and Codex are themselves harnesses. Your team provides a second one on top of them.

Context, the context graph that links it, workflow, restraint, empowerment, verification, and a visual interface.

Read the full essay: What we learned building the harness around our coding agents →

Invest in a harness that you own: prompt, seven-part harness, agent harness, and the underlying model
01

What your agent needs to know

Context

CLAUDE.md, AGENTS.md, path-scoped rules, reusable skills, examples and recipes, your data model, and your past decisions.

Each session starts with the team's accumulated decisions already in scope, instead of being re-derived from the prompt.

Context pillar with concrete example entries
02

How the context connects

Context Graph

Typed links between tracker items, plans, specs, diagrams, mockups, sessions, diffs, files, commits, and decisions.

Instead of copy-pasting six links into a prompt, the agent can follow the same chain you would.

Context Graph pillar with concrete example entries
03

The shape of a coding session

Workflow

Slash commands, plan-then-execute arcs, subagents, reusable skills, and worktrees so multiple agents can work in parallel without colliding.

A workflow layer keeps each session from reinventing itself every time it starts.

Workflow pillar with concrete example entries
04

What your agent must not do

Restraint

Hard rules, approval boundaries, permission scopes, tool allowlists, and an audit trail.

A capable agent without restraint eventually does something expensive, destructive, or embarrassing faster than you expected.

Restraint pillar with concrete example entries
05

What your agent can actually do

Empowerment

Tools that read logs, query the running database, drive the UI, take screenshots, and run end-to-end test loops.

An agent that can inspect the actual result can often close its own loop without a human in the middle.

Empowerment pillar with concrete example entries
06

How the agent proves a change works

Verification

Unit tests, end-to-end tests, fail-first reproductions, type checks, and a simulator for AI tool calls.

If the agent cannot show the change works end-to-end, it is not done.

Verification pillar with concrete example entries
07

How you and your agent share the work

Visual Interface

Markdown, mockups, diagrams, data models, red and green diffs, screenshots, and threaded discussions tied to the artifacts.

A visual workspace keeps decisions attached to artifacts instead of burying them in chat.

Visual Interface pillar with concrete example entries

A worked example

A harness in action

Here is what those seven parts look like filled in for a single concrete prompt, all the way through to the resulting outcome.

A harness in action: a prompt, the harness pillars populated with project-specific entries, the agent harness, the model, and the resulting outcome
The same seven-part structure, filled in with what each cell looks like for a real piece of work.

How to think about your harness

Prioritizing your harness

Own your harness

If you cannot read it, edit it, take it with you, and run it under any agent you choose, it is not yours.

Invest in your harness

Spend a meaningful share of your AI effort on better rules, tools, recorded decisions, and tighter verification loops. Treat the harness as a product your team ships to itself.

Keep it portable across models

Same files, same rules, same tools, same graph, whatever model lands next. If switching agents means rebuilding the harness, you do not really have optionality.

An example you can adopt

Nimbalyst is an open-source workspace built around these seven parts

Visual interface, context graph, workflow scaffolding, empowerment tools, verification loops, and cross-model CLAUDE.md and skills, all in one workspace. Claude Code and Codex run as first-class agents. The agent layer is pluggable for whatever lands next.

The desktop and iOS apps are MIT licensed. Study how they are wired, copy what is useful, or run Nimbalyst as your workspace.

Read about the context graph

FAQ

Questions about agent harnesses

What is an agent harness?
An agent harness is the system around the AI model that helps it do real work on your project. We think about ours in seven parts: context (what the agent knows about your code and conventions), a context graph (how that knowledge connects across tracker items, plans, diagrams, sessions, and files), workflow (slash commands, plan-then-execute, subagents, skills, worktrees), restraint (rules, permissions, allowlists), empowerment (tools that touch live state), verification (tests, type checks, fail-first reproductions, AI tool simulators), and a visual interface. The model is interchangeable. The harness is where your durable investment lives.
How is a harness different from Claude Code or Codex?
Claude Code and Codex are themselves harnesses. They wrap a frontier model with a system prompt, a tool set, a permission system, and an execution loop. Your team provides a second harness on top of that: the workspace, the linked context, the workflow, the rules, the verification loop, and the tools that are specific to your project.
Why does the harness matter more than the model?
Frontier models flip the leaderboard every few weeks. Recent studies from Stanford and Tsinghua show that the orchestration code around the model drives more performance variation than the model itself: the same model can produce a six-times gap in result quality depending on the harness it runs in. Investment in your harness compounds and survives model churn. Investment in tuning prompts for last quarter's model does not.
How do I start building a harness for Claude Code and Codex?
Start with a CLAUDE.md or AGENTS.md at the root of your project that captures your real conventions and hard rules. Add path-scoped rule files for areas with special concerns. Wire up at least one tool that lets the agent verify its own work, like a test loop or a screenshot tool. Adopt a workspace like Nimbalyst that gives you a linked context graph, workflow scaffolding, and visual editors out of the box, so the agent and the human can work from the same artifacts.
What is the context graph in a harness?
The context graph records persistent, typed links between the artifacts that matter. Tracker item to plan, plan to spec, spec to diagram, diagram to session, session to diff, diff to files, decision to the work that forced it. Without it, the connections between work live only in human heads and an agent cannot traverse them. With it, both human and agent can pick up where the last session left off in a single traversal.
Is Nimbalyst the only way to build a harness?
No. Many of the pieces of a good harness, like CLAUDE.md, path-scoped rules, and tool definitions, can be built up inside any project. Nimbalyst is one open-source example of a workspace that already includes the context graph, workflow, verification, and visual interface parts. Adopt it whole, copy ideas from it, or use it as a reference while building your own.

Nimbalyst: the open-source visual workspace for building with Codex, Claude Code, and more