Correction Audit · inbound only

Give us one agent workflow.
We’ll find what humans keep correcting.

A Correction Audit is the low-friction way to work with Calx. Two weeks, async, one workflow. You get a behavioral control report that names every recurring correction your agents ignore, classifies what compiles into runtime enforcement, and quantifies the baseline.

Start an audit →See a sample report

format

Async + read-out

duration

Two weeks

scope

One workflow

Next step

Design partnership

How the audit runs

Four phases. One behavioral control report.

We instrument the workflow you give us, cluster recurring corrections, classify them, and walk you through the enforcement plan.

01

Scope + instrument

You pick one supervised agent workflow. We set up read-only correction capture, scope-locked to that surface. No prompts, no completions leave your environment.

artifact · scope + data-handling memo

02

Capture corrections

Two weeks of normal work. We observe where humans correct the agent. Recurrence gets clustered across sessions, scoped to operator identity.

artifact · correction stream + cluster map

03

Classify + enforce

Every cluster goes through the Paper 2 codebook: architectural (structurally enforceable) or process (text rule that will keep failing). Each gets an enforcement plan.

artifact · classification + enforcement plan

04

Deliver + walk through

You get a PDF behavioral control report. 30-minute read-out. Next step: a scoped pilot with Bench + Tether to enforce the architectural rules we found.

artifact · behavioral control report

What the report looks like

The deliverable, redacted.

Every audit produces a report in this shape. Names and workflow details are redacted. The classification, recurrence counts, and enforcement plan are always there.

Calx · Behavioral Control Report

Correction Audit

Prepared for CONFIDENTIAL CLIENT NAME

Workflow: INTERNAL AGENT PLATFORM

Period: 14 days · 2026-04-07 → 2026-04-21

Executive summary. Over 14 days on one workflow (6 operators, 5 supervised agents), Calx captured 87 distinct corrections across 23 recurrence clusters. Of these, 11 are architectural and can compile into Tether middleware hooks with zero-recurrence guarantees in the covered classes. 12 are process: text rules that will continue to fail under load without structural enforcement or review gates. The top 3 clusters account for 41 percent of total correction cost.

87

Corrections captured

23

Recurrence clusters

41%

Cost concentrated in top 3

Top recurrence clusters

Where humans pay twice.

Cluster	Events	Class	Enforcement plan
Destructive git operations despite RULE NAME	18	ARCHITECTURAL	Tether tool-veto on `--force`, `--no-verify`; require approval
Wrong mock patch target in tests	12	PROCESS	Response-review gate + CI check; text rule alone will not hold
Model edits skipping schema co-update	11	ARCHITECTURAL	Pre-commit import-guard + response review on co-modification
API endpoint shape drift from SPEC NAME	9	ARCHITECTURAL	Schema-contract hook at response delivery
Spec-before-build omission	7	PROCESS	Review-gate checkpoint in Bench; cannot be architecturally enforced
Async/sync blocking in request handlers	6	ARCHITECTURAL	Runtime assertion hook; model-agnostic via Tether
Env-var exposure in logs	6	ARCHITECTURAL	Response-review pattern-match before delivery
Comment drift / stale inline comments	5	PROCESS	Low-impact; deferred to process rule, not compiled

Next-step recommendations

Scoped pilot plan

Phase 1 compiles the four highest-frequency ARCHITECTURAL clusters into Tether hooks and ships them to ENGINEERING SUB-TEAM (6 operators). Baseline measured. Second capture window at day 14 measures recurrence reduction in the covered classes.

Phase 2 extends to cross-agent patterns and adds the three process clusters to Bench review gates. Full pilot report at 6 weeks.

calx.sh / auditpage 1 / 12v0.3 · synthetic sample

What we need from you

Low lift. One workflow. One buyer.

01 · One workflow

A supervised agent surface where humans correct regularly.

Cursor, Claude Code, Codex, Devin, or your own harness. If there is a place where a human keeps fixing the agent's output, that is the workflow.

02 · One buyer

Someone who owns the rollout.

AI Platform Lead, DevEx Lead, Technical Founder, or the CTO. We do not run audits without a single person responsible for the enforcement decisions.

03 · Read access

Correction stream, scoped to the workflow.

We instrument the correction surface. We do not read prompts, completions, or source code. Scope is negotiated and documented in the data-handling memo before we start.

Who the audit is for

Correction density qualifies you.

Not job title. If your agents keep getting corrected in ways a human has to remember, the audit will find it.

AI Platform Lead

Rolling out agents across teams.

Audit for platform leads →

DevEx / Tooling Lead

Owns CLAUDE.md across engineers.

Audit for DevEx →

Technical Founder

Dogfoods agents daily.

Audit for founders →

Frequently asked

Questions before you book.

How much does an audit cost?

Pricing is scoped per workflow. We run a short scoping call first, agree on the scope and access, and price from there. The intake below gives us what we need to quote.

Do you need to see our prompts or code?

No. Calx instruments the correction surface. We see where humans correct the agent; we do not read prompts, completions, or source code. Scope and data handling are documented before the audit begins.

What if nothing recurs?

That is a result. If a workflow produces under three recurrence clusters in 14 days, we report that. You keep the data-handling memo and the capture methodology. In practice, every audit to date has surfaced 20+ clusters.

Can this run against a harness we built ourselves?

Yes. Internal harnesses are one of the clearest wedges. Keep your harness. Calx runs underneath as the correction compiler. The audit will show you the parts of the correction lifecycle you have not built yet.

What's the next step after the audit?

A scoped pilot. Ship the architectural enforcement plan with Tether + Bench across a small team. Measure recurrence reduction against the audit baseline. If you want to engage deeper, design partnership is the expansion path.

Is this a sales call dressed up as a diagnostic?

No. The report has value on its own. It is the thing you use to decide whether Calx is worth piloting. If the classifications do not map to enforcement you want, the audit is the last thing we ship together.

Book a Correction Audit.

Tell us one workflow where your team keeps correcting agents. We come back within 48 hours with a scoping call and a quote.

Pick a scoping time →Email Spencer directly