Building Docs for the AI Era, Part 1: Self-Healing Docs

📚 Series: Building Docs for the AI Era:

This post is the first of many in this series. You can find the outline for upcoming posts below:

Part 1: Self-Healing Docs (you are here)
Part 2: The Multi-Agent Documentation Pipeline
Part 3: Redesigning the Documentation Experience

1:30 AM

Every weeknight at 1:30 AM UTC, a GitHub Actions workflow wakes up and pulls the pull requests merged into Strapi's main codebase over the past 24 hours. Tonight, there are six.

In under two minutes, the pipeline processes all of them. First, a set of deterministic shell scripts eliminates three: a Dependabot bump, an internal refactor, a typo fix in a test file. Zero AI tokens consumed. The three survivors go to a lightweight AI model that reads only their titles and descriptions. One more is discarded: a CSS tweak in the admin panel, no user-facing impact. Two remain.

The pipeline looks closer. One is a micro-edit: a new filtering option was added to the REST API, and the existing docs page needs a link to the updated reference. The model handles it end-to-end and opens a draft PR. The other is more substantial: a new configuration parameter for the Upload plugin that needs its own section. A more capable model drafts it, following our style guide and templates, and opens a second draft PR.

Cost of tonight's run: $0.36.

At 9:00 AM, I open Slack and see a notification for 2 new PRs in an automated channel. The first PR: structure right, content accurate, but the phrasing needs tightening and a cross-reference is missing. Five minutes, fixed, merged. The second is the micro-edit. Link in the right place, correct anchor. Merged as-is.

This is what self-healing documentation looks like. AI preparing the work so the writer can focus on what actually matters.

But let me back up. I'm Pierre, piwi to most people, Strapi's Documentation Architect, and I'd like to tell you how this system came to be.

The drift

In every software project with separate code and documentation repositories, there is a quiet, persistent failure mode: docs drifting out of sync with the product. A developer merges a PR that renames an API parameter. The docs still say the old name. Nobody notices for weeks, until a user files an issue: "the docs say X but the API returns Y."

At Strapi, we ship weekly releases. Hundreds of community contributors send pull requests. The codebase moves fast. And since January 2026, I've been the sole Documentation Architect maintaining docs.strapi.io. The documentation foundations the self-healing system builds upon were created collectively over years by the documentation team and the broader Strapi community.

Before the self-healing pipeline, my routine was this: once a week, I would manually scan every PR merged into strapi/strapi over the previous seven days. Dozens of them. Reading titles, skimming descriptions, opening diffs when something looked ambiguous. Most needed no documentation change. The ones that did were buried in the noise, and finding them was detective work. The mental tax of holding fifty open PRs in my head, quietly worried one might slip through. None of it was the work I trained for.

This is not just a Strapi problem. Across the industry, documentation teams are shrinking. The Write the Docs community has been sounding the alarm: companies are cutting technical writing teams, betting AI can fill the gap. But as the team at Medusa.js put it when describing their own automation, the guardrails exist explicitly "to avoid AI slop." Without someone who understands the product, the audience, and the context, fully AI-generated documentation is unreliable at best.

The answer is neither "replace writers with AI" nor "ignore AI and keep doing everything by hand." It's designing a system where each does what it does best:

The machine detects, filters, and prepares. The human architects, judges, and prioritizes.

Every merged pull request already contains what you need to decide whether documentation should change. The title tells you the area, the description the intent, the diff what actually changed. Structured, machine-readable data sitting in your version control system.

So I built a pipeline that watches every change, decides what impacts the docs, and prepares the work. Nightly, not weekly: documentation should reflect yesterday's code, not last week's. And handling five to ten PRs a night is cheaper and more reliable than batching thirty at once, where context windows fill up and models lose focus. The final editorial pass stays human.

Progressive cost gating

The pipeline is a funnel. Each stage reduces the volume before the next one runs, so the more capable (and costly) model only activates when the work genuinely requires it. It runs as a GitHub Actions workflow every weeknight, in four stages.

Stage 0: Deterministic pre-filtering

Before any model touches anything, shell scripts with basic pattern matching strip out the obvious noise. A PR title starting with chore:, test:, refactor:, or ci:? Not user-facing. A PR from Dependabot? Dependency maintenance. Already processed by a previous run? Skip it (idempotency for the win). This stage eliminates roughly half of all merged PRs before a single AI token is consumed.

The design principle: Don't use AI for what a regex can do.

Stage 1: Lightweight triage

The survivors go to a small, fast, cheap model. It reads only the title and description of each PR (not the diff, which costs more) and makes one binary call: does this affect user-facing behavior?

The bias is deliberate. When in doubt, the model says yes. A false positive costs a few cents at the next stage. A false negative is a documentation gap nobody catches until a user reports it.

The trade-off: The next stage is always cheaper than missing a doc gap.

Stage 2: Intelligent routing

Now the same lightweight model goes deeper. It reads the actual diffs and cross-references them against the documentation structure (sidebar, page index, existing content) and routes each PR:

Skip: the change doesn't affect documentation after all.
Micro-edit: a small, well-defined change (add a link, update a mention, add a tip). The lightweight model handles these end-to-end, creating the branch and opening a draft PR itself.
Full draft: a structural change (a new section, a new page). Escalated to a more capable model.
Ask the human: ambiguous. Flag it for manual review.

This is where it gets interesting. Many real documentation updates are micro-edits, and the lightweight model handles them without ever involving the expensive one.

The insight: the cheapest model that can read code handles most of the actual work.

Stage 3: Full drafting

Only PRs needing structural changes reach the more capable model. When they do, it loads our full authoring toolkit: style guides, page templates, content standards, and our tailored AI writing harness. It drafts the content, self-reviews against our quality checks, and opens a draft PR. Every generated PR is a draft. The merge button stays with me.

The constraint: the more capable model only runs when the work justifies the cost.

The proof

The pipeline has been running in production for two months. Here are the real numbers after eight weeks (~40 runs).

The funnel:

The outcomes:

Of those 22 PRs, 9 were micro-edits from the lightweight model alone and 13 were full drafts from the more capable one. The architectural bet holds, though not where I first expected it. Over a larger sample, full drafts slightly outnumber micro-edits, so the cheap model doesn't author most of the output. But it does most of the work: of 107 PRs sent to triage, it filtered 77 on its own and escalated only 13. The expensive model runs on roughly one in eight PRs that reach AI. That's the gate that keeps costs flat.

The cost:

Total AI spend over two months: $11.38. About $1.60 per week, or $0.32 per run. Keeping documentation in sync with a fast-moving open-source codebase costs roughly a cup of coffee a week.

The honesty:

14 of the 22 PRs were merged (64% acceptance); the other 8 were closed after a closer look showed the change wasn't actually needed. So roughly one generated PR in three gets thrown away. Of the 14 that merged, half went in as-is and half needed edits first.

This is the point, not a failure. The system produces first drafts, not final copy, and the architect is the filter that catches the third that shouldn't ship. A draft that lands in the right place with the right structure saves more time than a blank page, even when it's wrong often enough that no one would merge it unread. Human review is part of the design, not a workaround for it.

The human at the center

Tom Johnson, a respected voice in technical writing, describes the emerging shape of the profession as "cyborg technical writers: augmented, not replaced, by AI."

The self-healing pipeline is one half of a two-part system.

The first half is human-driven. When a developer or PM knows a PR will have significant documentation impact, they add a flag: documentation label, which fires a Slack notification. I pick up the thread, understand the context, talk to the dev or PM if needed, and write the documentation with AI assistance (Inki, for instance, is a custom-made Claude Code plugin that lives in the docs repo). This is the path for new features, breaking changes, and anything that requires understanding the product intent behind the code. Collaborative, deliberate, AI-augmented.

The second half is automation-driven. The nightly pipeline catches everything that slips through: the small behavior changes, the new parameters nobody thought to flag, the edge cases that are easy to miss when you're shipping fast. It's the safety net, not the main act.

Together they give us something neither could alone: 100% coverage. Every merged PR is either handled through direct collaboration or caught by the pipeline. And it works because I designed it and audit every run. Two months in, zero false negatives: that's not only the AI being good, that's the architect doing the job.

The flip side is what AI cannot do, and what a human must:

Prioritization. Which features deserve a full page and which a single mention? That takes understanding the roadmap, user needs, editorial judgment.
Product memory. Why was this parameter named this way? What decision led to this API shape? Institutional knowledge can't be reconstructed from diffs.
Upstream signaling. Sometimes, reviewing a PR, I spot that a name or a default will confuse users before they ever reach the docs. That feedback loop only works with a human in it.
Pedagogical judgment. In what order should concepts be introduced? When do you show an example before explaining? AI can follow a style guide, but it cannot teach. Information Architecture still lies beyond Artificial Intelligence.

That's also where my role moves: from "detect what changed in the codebase" to "design how the product is understood and taught." The time the pipeline gives back goes to information architecture, learning experience, strategic work. For developers and product teams, the effect is friction down, coverage up: a renamed parameter no longer triggers a Slack thread asking "who handles the docs?" The pipeline catches it, and nobody has to wonder whether someone will.

Documentation doesn't have to be a bottleneck. It can be infrastructure.

Try it yourself

The entire pipeline is open source: the GitHub Actions workflow, the AI prompts, the filtering logic, and the decisions behind them. It's all in the strapi/documentation repository.

The pattern (progressive cost gating with model specialization) is not Strapi-specific. If your project has a code repo and a separate docs repo, you can adapt it. Start with deterministic filters for the obvious noise. Add a lightweight model for triage. Only escalate to an expensive model when the work requires it.

We're publishing this as an invitation. The more open-source projects adopt this pattern, the more the ecosystem benefits.

In Part 2, I'll describe the multi-agent pipeline behind the drafting stage: how specialized AI workflows (Router, Drafter, Outliner, various Checkers) collaborate to produce documentation that follows our standards, and what building that system taught me about documentation architecture.