Give us one agent workflow. We’ll find what humans keep correcting.
A Correction Audit is the low-friction way to work with Calx. Two weeks, async, one workflow. You get a behavioral control report that names every recurring correction your agents ignore, classifies what compiles into runtime enforcement, and quantifies the baseline.
We instrument the workflow you give us, cluster recurring corrections, classify them, and walk you through the enforcement plan.
01
Scope + instrument
You pick one supervised agent workflow. We set up read-only correction capture, scope-locked to that surface. No prompts, no completions leave your environment.
artifact · scope + data-handling memo
02
Capture corrections
Two weeks of normal work. We observe where humans correct the agent. Recurrence gets clustered across sessions, scoped to operator identity.
artifact · correction stream + cluster map
03
Classify + enforce
Every cluster goes through the Paper 2 codebook: architectural (structurally enforceable) or process (text rule that will keep failing). Each gets an enforcement plan.
artifact · classification + enforcement plan
04
Deliver + walk through
You get a PDF behavioral control report. 30-minute read-out. Next step: a scoped pilot with Bench + Tether to enforce the architectural rules we found.
artifact · behavioral control report
What the report looks like
The deliverable, redacted.
Every audit produces a report in this shape. Names and workflow details are redacted. The classification, recurrence counts, and enforcement plan are always there.
Calx · Behavioral Control Report
Correction Audit
Prepared for CONFIDENTIAL CLIENT NAME
Workflow: INTERNAL AGENT PLATFORM
Period: 14 days · 2026-04-07 → 2026-04-21
Executive summary. Over 14 days on one workflow (6 operators, 5 supervised agents), Calx captured 87 distinct corrections across 23 recurrence clusters. Of these, 11 are architectural and can compile into Tether middleware hooks with zero-recurrence guarantees in the covered classes. 12 are process: text rules that will continue to fail under load without structural enforcement or review gates. The top 3 clusters account for 41 percent of total correction cost.
87
Corrections captured
23
Recurrence clusters
41%
Cost concentrated in top 3
Top recurrence clusters
Where humans pay twice.
Cluster
Events
Class
Enforcement plan
Destructive git operations despite RULE NAME
18
ARCHITECTURAL
Tether tool-veto on --force, --no-verify; require approval
Wrong mock patch target in tests
12
PROCESS
Response-review gate + CI check; text rule alone will not hold
Model edits skipping schema co-update
11
ARCHITECTURAL
Pre-commit import-guard + response review on co-modification
API endpoint shape drift from SPEC NAME
9
ARCHITECTURAL
Schema-contract hook at response delivery
Spec-before-build omission
7
PROCESS
Review-gate checkpoint in Bench; cannot be architecturally enforced
Async/sync blocking in request handlers
6
ARCHITECTURAL
Runtime assertion hook; model-agnostic via Tether
Env-var exposure in logs
6
ARCHITECTURAL
Response-review pattern-match before delivery
Comment drift / stale inline comments
5
PROCESS
Low-impact; deferred to process rule, not compiled
Next-step recommendations
Scoped pilot plan
Phase 1 compiles the four highest-frequency ARCHITECTURAL clusters into Tether hooks and ships them to ENGINEERING SUB-TEAM (6 operators). Baseline measured. Second capture window at day 14 measures recurrence reduction in the covered classes.
Phase 2 extends to cross-agent patterns and adds the three process clusters to Bench review gates. Full pilot report at 6 weeks.
calx.sh / auditpage 1 / 12v0.3 · synthetic sample
What we need from you
Low lift. One workflow. One buyer.
01 · One workflow
A supervised agent surface where humans correct regularly.
Cursor, Claude Code, Codex, Devin, or your own harness. If there is a place where a human keeps fixing the agent's output, that is the workflow.
02 · One buyer
Someone who owns the rollout.
AI Platform Lead, DevEx Lead, Technical Founder, or the CTO. We do not run audits without a single person responsible for the enforcement decisions.
03 · Read access
Correction stream, scoped to the workflow.
We instrument the correction surface. We do not read prompts, completions, or source code. Scope is negotiated and documented in the data-handling memo before we start.
Who the audit is for
Correction density qualifies you.
Not job title. If your agents keep getting corrected in ways a human has to remember, the audit will find it.
Pricing is scoped per workflow. We run a short scoping call first, agree on the scope and access, and price from there. The intake below gives us what we need to quote.
Do you need to see our prompts or code?
No. Calx instruments the correction surface. We see where humans correct the agent; we do not read prompts, completions, or source code. Scope and data handling are documented before the audit begins.
What if nothing recurs?
That is a result. If a workflow produces under three recurrence clusters in 14 days, we report that. You keep the data-handling memo and the capture methodology. In practice, every audit to date has surfaced 20+ clusters.
Can this run against a harness we built ourselves?
Yes. Internal harnesses are one of the clearest wedges. Keep your harness. Calx runs underneath as the correction compiler. The audit will show you the parts of the correction lifecycle you have not built yet.
What's the next step after the audit?
A scoped pilot. Ship the architectural enforcement plan with Tether + Bench across a small team. Measure recurrence reduction against the audit baseline. If you want to engage deeper, design partnership is the expansion path.
Is this a sales call dressed up as a diagnostic?
No. The report has value on its own. It is the thing you use to decide whether Calx is worth piloting. If the classifications do not map to enforcement you want, the audit is the last thing we ship together.
Book a Correction Audit.
Tell us one workflow where your team keeps correcting agents. We come back within 48 hours with a scoping call and a quote.