The Compiler Gap Consensus Teams Design Partners Book an Audit

01 / the consensusv2026.04

Behavioral infrastructure
for the supervised-agent era.
The field is building it. We’re naming it.

Seven AI labs, a dozen research groups, the EU, OWASP, and Colorado are each building a piece of the same layer. Nobody has named the whole thing. Calx is calling it behavioral infrastructure for the supervised-agent era and building the piece every cited source is missing: your corrections, compiled into runtime enforcement. Every citation below is independent. Every working model is public.

Book a Correction Audit →Read the research ↓

why now

The harness layer is consolidating.
The enforcement primitive isn’t.

OpenAI

Codex App Server + Codex 0.114.0 Guardian. Lifecycle hooks, skill governance. March 2026.

Anthropic

Claude Managed Agents public beta, April 8, 2026. Hosted runtime, session-metered at $0.08/hour.

Microsoft

Agent Governance Toolkit, MIT-licensed, April 2, 2026. All 10 OWASP agentic risks, inside the runtime.

Cited work and working models

Independent research, academic papers, open-source releases, and standards bodies shaping the category Calx is naming. We cite each one below.

Tier 01 · Industry

OpenAI

Microsoft

What each one built. What Calx adds.

Eighteen entities, each building a piece of the same layer. No claim of endorsement. Every row is a public artifact: a shipped system, a published paper, or a regulatory instrument.

Tier 01 · Industry

What they builtWhat Calx adds

OpenAI

Harness engineering is a named architectural pattern. The Codex App Server decouples agent core logic from client surfaces (CLI, VS Code, web, desktop) through a bidirectional protocol.

Calx builds the behavioral governance layer inside the same harness pattern, cross-runtime and model-agnostic by construction. OpenAI is building their harness for Codex. Calx is building the one everyone else needs.

Microsoft

Enterprise agent governance shipped this month. Agent Governance Toolkit (MIT-licensed, April 2, 2026) enforces all 10 OWASP agentic risks deterministically, at sub-millisecond latency, inside the agent runtime. Microsoft has named the category by shipping it.

Microsoft's toolkit enforces policies admins pre-write. Calx compiles the policies nobody knew to write: the corrections your team makes every day, captured automatically and promoted to enforcement. Same runtime posture, inverse origination.

AWS

AgentCore policy and evaluation governance (Q1 2026). Defines which apps, APIs, and MCP servers agents can access. First hyperscaler to ship an agent governance surface as cloud primitive, not library.

AWS gates capability at the account and IAM boundary. Calx runs inside that boundary and compiles the behavioral rules the IAM layer cannot express: patterns of recurrence captured from how humans correct agents at runtime. Policy plus behavior.

LangChain

Middleware chain is the composition pattern for governance primitives. LangChain 1.0 shipped SummarizationMiddleware, PIIMiddleware, HumanInTheLoopMiddleware, and four more hooks as composable, single-responsibility interceptors.

Calx treats correction capture, recurrence detection, and rule enforcement as middleware in this pattern. The composition is portable across frameworks and testable in isolation.

Anthropic

Claude Managed Agents public beta, April 8, 2026. Anthropic shipped a hosted agent runtime bundling agent loop, tool execution, sandbox, and state persistence. Session-metered at $0.08/hour. Anthropic is productizing the harness layer for Claude.

Calx is the cross-runtime version of what Anthropic built for Claude. Behavioral enforcement that survives across providers, not tied to a specific vendor’s hosted runtime. The rule you compile once enforces wherever your agents run.

Calx’s own published research

Three peer-reviewable papers on Zenodo, all CC-BY-4.0. The behavioral plane, the stickiness failure mode, and the compiler gap. Independent evidence for the category this page is naming.

Paper I

The Behavioral Plane

237 rules transferred from one agent to another. The receiving agent made 44 novel failures in categories the rules explicitly addressed. Behavioral knowledge does not transfer through text.

DOI: 10.5281/zenodo.19159223Read the paper →

Paper II

Stickiness Without Resistance

Without human friction in the correction loop, agents accept instructions but fail to modify behavior. Compliance is performed, not enacted.

DOI: 10.5281/zenodo.19382717Read the paper →

Paper III

The Compiler Gap

Nine formatting rules tested across three context lengths. Text instructions: 0/9 enforced. Structural enforcement: 9/9. The variable was the delivery mechanism.

DOI: 10.5281/zenodo.19384855Read the paper →

06 / the framework

Two planes. Calx builds one.

Behavioral infrastructure for the supervised-agent era runs on two planes. The information plane is what the agent knows. The behavioral plane is what the agent does. Calx builds the behavioral plane and integrates cleanly with the information plane.

InteractionWhere the user works

Bench//Cursor · Claude Desktop · ChatGPT · IDE plugins

Information planeWhat the agent knows

MemoryMem0 · Letta · Zep

RetrievalRAG · Pinecone · Weaviate · pgvector

ContextCLAUDE.md · system prompts · context loaders

Calx integrates here. We do not build here. Memory and retrieval are a different category.

Behavioral planeWhat the agent does

Calx native

GovernanceCorrection lifecycle · recurrence detection · rule promotion

CompilerServe//correction-to-rule compilation · proprietary pipeline

HarnessTether//OpenAI Codex App Server · Anthropic Managed Agents · LangGraph · Microsoft AGT · AWS AgentCore · Manus

Corrections captured at the source, compiled into rules inside the harness, and enforced structurally before the agent runs.

ModelOpenAI · Anthropic · Google · Meta · open-source

Model-agnostic by construction

ThesisWhy Calx owns the harness

Calx builds a model-agnostic harness on top of LangChain · LangGraph · deep agents, assembled from a cherry-picked middleware library and a proprietary correction compilation engine.

Owning the harness is the mechanism. We build the behavioral layer inside the harness itself, capture corrections at the point of execution, and compile them into rules within the harness. Compiled rules produce zero recurrence in the covered architectural classes because they are enforced before the agent runs, at the layer where the agent runs.

Model-agnostic BYOK routing. Any model, any deployment. The behavioral layer is portable by construction. Bench and the Calx harness are the first-party experience; the same behavioral governance layer also plugs into Cursor, Claude Desktop, OpenAI Codex App Server, Anthropic Managed Agents, LangGraph, AWS AgentCore, and any other interaction or harness in the ecosystem.

Calx is a behavioral plane system. The information plane (memory, retrieval, context injection) is a different category and a different problem. Systems like Mem0, Letta, and Zep live there and solve real problems we explicitly do not try to solve. Calx integrates cleanly with information plane systems. We do not compete with them. Paper 3 (“The Compiler Gap”) frames this as the missing layer: storing rules is not the same thing as governing behavior. We are the layer that governs behavior.

08 / answers

Questions a procurement skeptic asks first.

Short answers to the questions that decide whether this conversation is worth your team’s time.

What is behavioral infrastructure for the supervised-agent era?

The system layer that captures corrections, compiles them into structural rules, and enforces those rules inside the harness, before the agent runs. It is distinct from prompt engineering (information plane) and from memory systems (also information plane). Behavioral infrastructure is the enforcement layer. Calx is naming the category because seven labs converged on the harness in a single quarter, and nobody has named the compiler piece that compounds human corrections into runtime behavior.

How is Calx different from prompt engineering or rules files?

Prompt engineering writes text and hopes the agent follows it. Calx compiles corrections into structural enforcement that runs before the agent. We measured the difference in a controlled study: 0 of 9 for text rules, 9 of 9 for compiled rules. Same rules, same runtime, three context lengths. The variable was the delivery mechanism.

Is Calx competing with OpenAI Codex, Anthropic Managed Agents, or Microsoft Agent Governance Toolkit?

No. Calx is built on LangChain and LangGraph and runs cross-runtime, including alongside OpenAI Codex, Anthropic Managed Agents, AWS AgentCore, and Microsoft AGT. OpenAI and Anthropic each ship a hosted harness for their own model. Microsoft enforces admin-written policies. AWS gates capability at the IAM boundary. Calx builds the behavioral governance layer inside the same harness pattern, compiling the policies nobody knew to write, cross-runtime. Same category, different scope.

How does Calx relate to Mem0, Letta, Zep, and other memory systems?

Calx is a behavioral plane system. Memory and retrieval (information plane) are a different category solving a different problem. Calx integrates cleanly with information plane systems. We do not build them and we do not compete with them. Paper 3 (The Compiler Gap) frames the distinction explicitly: storing rules is not the same thing as governing behavior.

What models and runtimes does Calx support?

Model-agnostic by construction, via LiteLLM. BYOK for any model: Claude, GPT, Gemini, Llama, open-source. The behavioral layer is portable and lives inside the harness, not the model. Native first-party experience is Bench plus the Calx harness (Tether). The same behavioral governance layer also plugs into Cursor, Claude Desktop, OpenAI Codex App Server, Anthropic Managed Agents, LangGraph, AWS AgentCore, and any other interaction or harness in the ecosystem.

What does Calx cite as evidence for the category?

Eighteen entities on the page above: every one of them has shipped a working system, published a peer-reviewable paper, or enacted a regulatory instrument that addresses a piece of behavioral infrastructure. Industry: OpenAI, Microsoft, LangChain, Anthropic, Meta, Manus, Cursor, HumanLayer. Academic: Stanford and SambaNova on ACE; UCL and Huawei Noah’s Ark Lab on Memento-Skills; Tsinghua on via negativa alignment; TU Eindhoven on runtime governance; Singapore Management University on AgentSpec; Phil Schmid (Google DeepMind) on the harness-as-OS analogy; Varun Pratap Bhardwaj on Agent Behavioral Contracts. Standards and regulatory: OWASP Top 10 for Agentic Applications, EU AI Act, Colorado AI Act. Calx publishes three of its own peer-reviewable papers on Zenodo (CC-BY-4.0).

Is Calx production-ready? When can I use it?

The Calx harness (Tether), the behavioral compilation engine (Serve), and the underlying runtime are running in production with the current design-partner cohort today. Bench (the first-party desktop experience) is in cohort and shipping to macOS first. Public release follows the design-partner cohort. If you are running agents across teams and the corrections are slipping, the fastest way in is a Correction Audit: book one at calx.sh/audit.

07 / the ratio

The product is the ratio.

Every primitive in behavioral infrastructure runs on one of two planes. The information plane is where rules live as text: prompts, CLAUDE.md files, system prompts, retrieval indices. Scaffolding.

The behavioral plane is where actions execute: gates, hooks, tests, compiled rules enforced before the agent runs. Enforcement.

Every AI system ships some ratio of the two. The product is the ratio. Calx is the company whose entire product optimizes for it.

OpenAI built the harness. Microsoft shipped the governance toolkit. LangChain shipped the middleware pattern. Stanford proved the delta-update lifecycle. UCL proved the convergence. Tsinghua proved the theoretical foundation. Meta proved the dual-channel requirement. Chris Argyris proved in 1977 that organizations that only patch symptoms never change the system. The same is true of AI agents. Every citation on this page is building a piece of this layer.

We are naming it behavioral infrastructure for AI agents. We are building the piece that turns your corrections into rules your agents cannot violate. We are not inventing this. We are finishing it.

Book a Correction Audit →Apply for design partnershipCohort is currently curated.

OWASP logo used under CC BY-SA 4.0. All other marks are the property of their respective owners and appear here for editorial citation of publicly available research and open-source work. Nothing on this page implies endorsement.

Behavioral infrastructurefor the supervised-agent era.The field is building it. We’re naming it.

The harness layer is consolidating.The enforcement primitive isn’t.

Cited work and working models

What each one built. What Calx adds.

Calx’s own published research

The Behavioral Plane

Stickiness Without Resistance

The Compiler Gap

Two planes. Calx builds one.

Questions a procurement skeptic asks first.

What is behavioral infrastructure for the supervised-agent era?

How is Calx different from prompt engineering or rules files?

Is Calx competing with OpenAI Codex, Anthropic Managed Agents, or Microsoft Agent Governance Toolkit?

How does Calx relate to Mem0, Letta, Zep, and other memory systems?

What models and runtimes does Calx support?

What does Calx cite as evidence for the category?

Is Calx production-ready? When can I use it?

The product is the ratio.

Behavioral infrastructure
for the supervised-agent era.
The field is building it. We’re naming it.

The harness layer is consolidating.
The enforcement primitive isn’t.