Using GitHub Copilot to Automate Documentation Testing: A Step-by-Step Guide

Introduction

Documentation is the gateway to your project, especially for open-source tools. When a command fails, an output doesn't match, or a step is unclear, most users won't file a bug report—they'll just leave. This silent drift accumulates as your code evolves, and manual testing can't keep up. The Drasi team, a CNCF sandbox project, faced this exact problem: they shipped code faster than they could manually test tutorials. After a Docker update broke every tutorial, they realized they needed an automated approach. By treating documentation testing as a monitoring problem, they built an AI agent using GitHub Copilot CLI and Dev Containers to act as a synthetic new user. In this guide, you'll learn how to replicate this process for your own project, turning documentation maintenance into an automated, continuous process.

Using GitHub Copilot to Automate Documentation Testing: A Step-by-Step Guide — Source: azure.microsoft.com

What You Need

GitHub Copilot CLI – The command-line interface for Copilot, which can be used to simulate user interactions.
Dev Containers (Visual Studio Code Dev Containers or GitHub Codespaces) – To create reproducible, isolated environments for testing.
Your project's documentation – Specifically, step-by-step tutorials or getting-started guides with exact commands and expected outputs.
Basic scripting knowledge – Familiarity with Bash, Python, or similar to orchestrate the agent.
A testing framework or output verifier – Something to compare actual outputs against expected ones (e.g., diff, grep, or a lightweight test runner).
A containerized environment – Docker installed, plus any dependencies your tutorials require (e.g., k3d, sample databases).

Step-by-Step Instructions

Step 1: Define the Agent's Behavior

The key is to create an agent that mimics a naïve, literal, and unforgiving new user. Write down three principles for your agent:

Naïveté: It must have no prior knowledge of your project. It only knows what's explicitly written in the documentation.
Literal execution: Every command must be executed exactly as written. If a step is missing, the agent should fail.
Unforgiving verification: It checks every expected output. If the doc says "You should see 'Success'" and the CLI returns nothing, the agent flags it as a bug.

Document these rules so your script can enforce them.

Step 2: Set Up a Dev Container with All Dependencies

Create a .devcontainer/devcontainer.json file for your repository. This ensures the testing environment matches your users' setup exactly. Include:

The base image (e.g., Ubuntu, Debian).
All runtime dependencies (Docker, k3d, sample databases, etc.).
Environment variables and startup commands that replicate the tutorial's prerequisites.

Test the container manually first to confirm it builds and launches correctly.

Step 3: Extract Expected Commands and Outputs from Documentation

Parse your tutorial markdown files to extract each code block or instruction. For each step, note:

The command (exact text from the > or code fence).
The expected output (if mentioned).
Any branching logic (e.g., "if you see this, run that").

You can do this manually for a few tutorials, or write a parser using regex. Save the mapping in a JSON file like tutorial_steps.json.

Step 4: Build the Agent Using GitHub Copilot CLI

GitHub Copilot CLI can be used to generate scripts that simulate user actions. Create a main script (e.g., agent_test.sh) that:

Loops through each step from your extracted JSON.
Runs the command using copilot run <command> or directly in the shell.
Captures the output (stdout and stderr).
Compares the output against the expected result using an assertion function (e.g., assert_output_contains "Success").

Example snippet:

# Inside agent_test.sh
while read -r step; do
    command=$(echo "$step" | jq -r '.command')
    expected=$(echo "$step" | jq -r '.expected_output')
    output=$(eval "$command" 2>&1)
    if echo "$output" | grep -q "$expected"; then
        echo "PASS: $command"
    else
        echo "FAIL: $command"
        exit 1
    fi
done < <(jq -c '.[]' tutorial_steps.json)

Note: For security, avoid eval in production; use proper input validation.

Step 5: Integrate with a Continuous Testing Pipeline

Place the agent script inside your Dev Container setup and run it automatically on a schedule or after code changes. Use GitHub Actions or a cron job to:

Spin up the Dev Container.
Execute agent_test.sh.
Collect results and report failures (e.g., via issue creation or email).

Example GitHub Action workflow:

name: Test Documentation
on:
  schedule:
    - cron: '0 6 * * *'  # daily
  push:
    paths:
      - 'docs/**'
      - '.devcontainer/**'
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build and run agent
        uses: devcontainers/ci@v0.3
        with:
          runCmd: bash agent_test.sh

Step 6: Handle Silent Failures with Monitoring

Standard CI passes if the script exits with code 0, but silent drift (e.g., a command that succeeds but produces unexpected side effects) requires extra care. Implement these strategies:

Log everything: Capture full command outputs and store them in a file with timestamps.
Diff against a golden copy: After an initial successful run, save the output as the baseline. Future runs compare against this baseline; any diff signals a change.
Monitor for missing commands: If a step in your JSON is skipped because the agent got stuck, alert the team.

Step 7: Iterate and Improve the Agent

Run the agent on your existing tutorials. Fix any failures by updating your documentation or the agent's assumptions. Over time, you'll build a robust test suite. Consider adding:

Parallel testing for multiple tutorials.
Versioning – Test against different releases or branches.
User feedback integration – Map real user issues to the failing steps.

Tips for Success

Start small: Begin with one simple tutorial to validate your agent before expanding.
Use a deterministic environment: Dev Containers ensure reproducibility, but also pin dependency versions to avoid unexpected breaks from external changes.
Pair with human review: Automated agents catch many bugs, but some require human context (e.g., ambiguous wording). Use the agent's reports as a triage tool.
Share your agent script with your team so others can contribute improvements.
Celebrate caught bugs: When the agent finds a documentation error before a user does, it's a win. Log these wins to motivate maintenance.
Consider privacy: If your documentation contains proprietary commands, be cautious about running them in an automated agent without approvals.

By following these steps, you transform documentation testing from a manual, reactive chore into an automated, proactive process. Your synthetic user will tirelessly verify every step, ensuring that your getting-started experience remains smooth even as your code evolves.