Research

The Compiler Gap.

Text rules don't compile into agent behavior. The evidence is in the wild. We named the pattern and built the fix.

Information Plane

What your agent knows. System prompts, CLAUDE.md, .cursorrules, memory stores.

The Compiler Gap

↑ ↓

Compilation: hooks, tests, gates, structural enforcement

↑ ↓

Behavioral Plane

What your agent actually does. Compliance, error avoidance, learned corrections.

You've seen this

×.cursorrules says "never use var." Agent uses var. Corrected. Tomorrow, var again.
×CLAUDE.md says "always use the v2 API." Agent calls v1. Added in bold. v1 again.
×System prompt says "formal tone only." Agent opens with "Hey!" Rewritten. "Hey!" again.

These aren't prompting failures. They're compilation failures.

OpenAI has begun calling this pattern harness engineering. Calx builds the behavioral governance layer inside that pattern, cross-runtime, for everyone else.

Independent evidence

Not our test. A practitioner on the Cursor forums ran this across three context lengths, nine formatting rules, three runs each. Same rules. Same agent. The only variable was the delivery mechanism.

0/9

Text rules

.cursorrules in agent mode

9/9

Compiled rules

Structural enforcement

We did not run this. We found it. It is exactly the pattern our own longitudinal data keeps producing, in a tighter setup than we could have built from the outside. We cite it. We did not author it.

Our field study

Separate from the controlled test above. A 43-day longitudinal observation of real human-AI collaboration across a working codebase. First-party data. Ours.

151

corrections tracked

days observed

AI agents

82K

lines of code

Why it happens

Information vs. behavior. Two different systems.

Positional decay

Attention fades with distance

Recency-priority encoding (RPE) means instructions at the top of context lose weight as the conversation grows. Rules written first are forgotten first.

Density collapse

More rules, worse behavior

At 500 instructions, accuracy drops to 68%. The more rules you add, the less likely any single rule is followed. Instruction density works against you.

Threshold collapse

Rules don't fade. They fall off a cliff.

Compliance does not degrade linearly. It holds until around the 40-50% context window threshold, then collapses. A cliff, not a slope.

Transfer failure

Corrections don't cross agent boundaries

237 rules transferred from one agent to another. The receiving agent made 44 new mistakes, 13 in categories the rules explicitly addressed.

The proof

Three papers. One finding.

Paper I

The Behavioral Plane: Why Learned Corrections Don't Transfer Between Agents

237 rules transferred from one agent to another. The receiving agent made 44 new mistakes, 13 in categories the rules explicitly addressed.

44 novel failures from 237 transferred rules

Read the paper →

Paper II

Stickiness Without Resistance: Knowledge Transfer Failure in Human-AI Collaboration Without Human Friction

Without human friction in the correction loop, agents accept instructions but fail to modify behavior. Compliance is performed, not enacted.

0% behavioral change with zero-friction transfer

Read the paper →

Paper III

The Compiler Gap: Why Your AI Agent Ignores Everything You Taught It

Names the pattern behind the practitioner test. Text instructions do not compile into agent behavior; structural enforcement does. The variable is the delivery mechanism, not the rule content.

Synthesis of independent evidence and first-party field data

Read the paper →

Corroboration

Search "rules ignored" in the Cursor forums. Read the Claude Code GitHub issues about .cursorrules drift. The Compiler Gap is already documented by practitioners. We named it.

Book a Correction Audit