Open Source

Automate Documentation Testing with AI Agents: A Step-by-Step Guide

2026-05-02 00:28:43

Introduction

For early-stage open-source projects, the Getting Started guide is often a developer’s first real interaction. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report—they’ll just move on. Drasi, a CNCF sandbox project for real-time data change detection, faced this challenge: a small team of four engineers shipping code faster than they could manually test tutorials. A 2025 GitHub Dev Container update bumped minimum Docker versions, breaking every tutorial silently. This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted from a manual chore into an automated monitoring problem. In this guide, you’ll learn how to build an AI agent that acts as a “synthetic new user” to test your documentation end-to-end, using GitHub Copilot CLI and Dev Containers.

Automate Documentation Testing with AI Agents: A Step-by-Step Guide
Source: azure.microsoft.com

What You Need

Step-by-Step Instructions

  1. Step 1: Set Up a Reproducible Dev Container Environment

    Create a .devcontainer/devcontainer.json file that mirrors the exact environment your tutorial assumes. Include all dependencies: Docker, k3d, sample databases, and any CLI tools. Use a Dockerfile to pin specific versions so that upstream changes don’t break your tests unexpectedly. Commit this configuration to your repository so the agent can spin up the container consistently.

  2. Step 2: Install and Configure GitHub Copilot CLI

    Inside the Dev Container, install the GitHub Copilot CLI by following the official documentation. Run gh copilot init to authenticate and set default prompts. The agent will use this CLI to interact with the terminal, executing commands exactly as written in the tutorial and verifying outputs.

  3. Step 3: Build the AI Agent Script

    Write a script (e.g., in Python or Bash) that acts as a synthetic new user. This script should:

    • Read each step of your Getting Started guide sequentially.
    • For each command, invoke GitHub Copilot CLI with a naïve prompt: e.g., gh copilot run "execute: docker run -d --name nginx nginx:latest".
    • Capture the output and compare it against expected results stated in the tutorial.
    • If the output matches, proceed; if not, log a failure and continue (or stop).
    • Report a summary of all passed/failed steps at the end.

    Key: treat the agent as completely literal and unforgiving. It should not infer missing steps or correct typos.

  4. Step 4: Enable Naïvety with Prompts

    Ensure the agent has no prior knowledge of your project. For each step, feed only the exact text from the tutorial. Do not include context like “wait for bootstrap”—the agent must not bring any implicit understanding. To enforce this, clear the conversation history between steps. Use a fresh Copilot session for each command to avoid the AI learning from previous successes.

  5. Step 5: Add Output Validation

    For each step that specifies an expected output (e.g., “You should see ‘Success’”), parse the terminal output for that exact string. If the output is missing or different, flag it as a bug. For steps without a defined expected output, define a baseline: the command should exit with code 0 and produce no errors. The agent should treat any stderr as a failure.

    Automate Documentation Testing with AI Agents: A Step-by-Step Guide
    Source: azure.microsoft.com
  6. Step 6: Run the Agent Against the Tutorial

    Execute the script inside the Dev Container. Watch it walk through the tutorial step by step. The agent will produce a report of all commands that succeeded and those that failed. Note that failures can be due to documentation errors, environment mismatches, or actual bugs in the getting-started flow.

  7. Step 7: Analyze Failures and Identify Silent Drift

    Review the report. Common failure types include: outdated commands, missing prerequisite steps, deprecated flags, or changed default behaviors. For example, if a tutorial says “run drasi list query” but the current version requires drasi query list, the agent will catch it. This reveals the silent drift that human reviewers often miss.

  8. Step 8: Integrate into CI/CD Pipeline

    Automate the agent to run on every pull request that touches the documentation directory. Use GitHub Actions or similar to spin up the Dev Container, install the agent, and execute the test. Fail the pipeline if any tutorial step breaks. This turns documentation testing into a continuous monitoring problem, ensuring tutorials remain valid as dependencies evolve.

Tips for Success

Explore

10 Crucial Updates for the nvptx64-nvidia-cuda Target in Rust 1.97 Your Step-by-Step Guide to Exploring the NASA Goddard Visitor Center’s 50-Year Legacy Best Apple Deals: M5 MacBook Air, iPad Air, MacBook Pro, and Apple Watch Series 11 at Record Low Prices Chrome Web Store SEO Breakthrough: Developer Reveals 340% Install Boost Through Hidden Search Algorithm 8 Fascinating Facts About the Pleiades 'Seven Sisters' and Their Ghostly Blue Veil