When Your Contracts Start Fixing Themselves

A task specification ran. Its job was straightforward: take a batch of blog posts that had been cleared for publication and sync them to the website. Routine operations work.

It found a problem nobody was looking for.

Two of those posts were marked as cleared — meaning they’d passed every content check — but were still flagged as drafts. A third set had the opposite inconsistency: the source files said “draft” but the published copies said “live.” The metadata contradicted itself.

The specification wasn’t asked to audit the clearance process. Its scope was “move files from A to B.” But the specification format required a findings section — a structured place to capture anything unexpected, regardless of whether it was in scope. So the agent noted what it found: the clearance process had two deterministic gaps. It never explicitly required draft status to change after clearance. And it never marked source files as processed, so they looked abandoned even after their cleared copies were live.

That finding became a new specification the same day. The follow-up didn’t just acknowledge the gaps — it codified the fixes into the clearance process itself. Added an explicit step to update draft status. Added another step to mark sources with processing metadata. Version-bumped the clearance definition. Backfilled every affected file.

The system had healed itself. Not because someone designed a self-healing mechanism, but because the structure of the work made gaps visible and actionable.

The Same Pattern, A Different Domain

This wasn’t the first time it happened. A few weeks earlier, a different specification was setting up a bridge between task tracking and project management. Its job: make work specifications automatically create and close tracking items.

During that integration work, the agent needed to reference metadata for several project boards. It found that most had incomplete metadata. They were effectively invisible — they existed, but no automated process could interact with them because their configuration had never been discovered.

Again: nobody asked for an audit. The specification was about building a bridge, not inventorying what was on both sides. But the findings section captured it, and the gap became a follow-up specification. That follow-up ran discovery across every project, filled in the missing configuration, and created a registry document so the gap couldn’t silently reopen.

Two cases. Same pattern. Same structure. Neither was designed to be self-healing. Both healed something anyway.

And Then It Happened a Third Time

While writing this post, the specification that produced it needed to work in a repository that wasn’t cloned locally. The agent cloned it — but the specification’s path convention was ambiguous. It used relative directory names without defining what they were relative to. The agent resolved the path one way; the filesystem resolved it another. Files ended up in a rogue directory outside the git repository. The build looked clean. The files existed. But nothing was actually committed.

The findings section caught it. The specification format required the agent to report what went wrong, and “ambiguous base path in specification format” became a finding. That finding became an immediate fix to the specification authoring process: a one-line clarification that defines the base path convention, so no future specification can create the same ambiguity.

Three cases now. And the third one happened during the execution of the specification that was documenting the first two. The system healed itself while explaining how it heals itself. Nobody planned that. The structure made it inevitable.

The Erlang Insight

If you’ve worked with Erlang — or any system built on its supervision philosophy — this pattern will feel familiar.

In Erlang, processes crash. That’s not a bug; it’s the design. The interesting part isn’t that processes crash — it’s what happens next. A supervisor process detects the crash, examines the failure, and restarts the process with a clean state. The crashed process doesn’t fix itself. The supervision structure around it does.

The critical insight from Erlang isn’t “things fail.” Everyone knows that. It’s that you can design the structure around failure so that failures become inputs to a repair process, rather than problems that accumulate silently.

That’s exactly what’s happening here, translated from software processes to work specifications.

A specification executes. It encounters something unexpected — not a crash, but a gap. The structured findings section captures the gap (this is the equivalent of Erlang’s crash report). The owner reviews the findings (this is the supervisor). If the gap is real, a new specification is authored to fix it (this is the restart with corrected state).

The specification didn’t fix itself. The structure around it — mandatory findings, owner review, rapid follow-up authoring — created a supervision loop that turned execution gaps into repair actions.

What Makes Self-Healing Possible

All three cases share four structural prerequisites. Remove any one and the pattern breaks.

1. Mandatory findings sections. Every specification includes a structured place for surprises. Not “notes” at the bottom. Not “comments.” A first-class section called “findings” that exists whether or not there’s anything to put in it. The empty findings section is a prompt: “What did you notice that wasn’t in your scope?”

This is the difference between a process that discovers things and a process that reports things. Without the structured place, discoveries stay implicit — an agent notices something, maybe mentions it in passing, and the observation evaporates.

2. Scope discipline. Both agents stayed in their lane. The publishing agent didn’t try to fix the clearance process — it just reported the gaps. The integration agent didn’t try to discover project metadata — it just noted which ones were missing.

This matters because scope creep kills the signal. If an agent tries to fix what it finds, its output becomes a messy combination of its primary deliverable and ad hoc repairs. The findings section stays clean only when the agent does one thing: report what it noticed, then move on.

3. Owner review loop. A human reads every findings section. This is the supervisor in the Erlang analogy. The human determines: is this finding real? Is it worth acting on? What’s the right response?

Not every finding becomes a follow-up. Some are noise. Some are known tradeoffs. Some are observations that matter later but not now. The owner filters, prioritizes, and decides. Without this step, every anomaly would generate a follow-up specification, and the system would drown in self-generated work.

4. Rapid follow-up authoring. When a finding is real, the response is immediate: a new specification, authored the same day, with the gap as its explicit input. The finding doesn’t go into a backlog. It doesn’t become a ticket that ages. It becomes a specification with its own scope, its own execution, its own findings section — which means it can discover further gaps, continuing the cycle.

Speed matters here because gaps are freshest when they’re first discovered. The context is warm. The evidence is specific. Wait a week, and the finding becomes a vague note that says “clearance process has issues.” Act the same day, and the specification says “Step 5 of the clearance process never sets draft status to false after clearance passes, creating a metadata contradiction in every processed file.”

How This Differs From Cross-Checking

I’ve written before about what happens when parallel agents inadvertently cross-reference each other’s work. This pattern looks similar but operates differently.

Parallel cross-checking is spatial — two agents working simultaneously on overlapping domains, where inconsistencies surface through structural overlap. Neither agent intends to check the other. The quality signal emerges from the geometry of their scopes.

Self-healing is temporal — one specification’s output becomes another specification’s input, sequentially. The first specification finishes, reports its findings, and the findings inform what the next specification does. It’s not two perspectives on the same thing at the same time. It’s one perspective refining the system across time.

The analogy to programming holds. Cross-checking is like running two independent test suites that happen to cover overlapping behavior — failures in one illuminate bugs relevant to the other. Self-healing is like Erlang’s supervision tree — a crash in one process triggers a structured response that restores the system to a better state than before the crash.

Both patterns are valuable. But self-healing is the one that compounds. Each repair makes the system slightly more robust for every specification that runs afterward. The clearance process now explicitly handles draft status because a publishing specification noticed it didn’t. Every future clearance will benefit from that fix. The gap is closed permanently.

The Design Principle

Here’s what I’ve taken from watching this pattern emerge three times — the last time while writing this very post:

Don’t design self-healing systems. Design systems where healing can emerge.

The difference is crucial. If you try to design self-healing upfront, you’ll over-engineer it — building complex feedback loops, automated remediation, circuit breakers for problems you haven’t encountered yet. You’ll solve imaginary problems and miss real ones.

Instead, put three things in place:

Give every process a structured way to report surprises. Not bugs. Not errors. Surprises — things that are outside scope but worth noting. Make this a first-class part of the output format, not an afterthought.
Keep a human in the supervision loop. Not to micromanage execution, but to evaluate findings. Is this real? Is it worth acting on? What’s the right scope for a fix? The human is the supervisor that decides which crashes warrant a restart.
Make follow-up fast. When a finding is real, the response should be immediate and specific. A new specification, not a backlog item. A scoped fix, not a vague “we should look into this.” Speed preserves context, and context is what makes fixes precise.

Do those three things, and self-healing will emerge from your own execution trail. Not because you designed it. Because you created the structural conditions where gaps become visible, visibility becomes action, and action becomes permanent improvement.

The system fixes itself — through the structure you gave it for reporting what it found.

This is the fifth post in a series about applying programming language concepts to organizational design. Previously: When Your Diagnostic Tool Finds What You Didn’t Know, The Bootstrap Paradox, The Beautiful Absurdity of Modeling Your Company in Java, Why Every Founder Should Think in Types.

Huy Dang is the founder of AccelMars, building tools for the AI era. Follow the journey on X and LinkedIn.

When Your Contracts Start Fixing Themselves

When Your Contracts Start Fixing Themselves

The Same Pattern, A Different Domain

And Then It Happened a Third Time

The Erlang Insight

What Makes Self-Healing Possible

How This Differs From Cross-Checking

The Design Principle

Related Posts

Measuring AI-Augmented Output: What Matters and What Doesn't

My AI Cofounder Ran 6 Parallel Sessions While I Thought

Introducing AccelMars Mind: The Missing Layer for AI-Powered Development