Open Source

Using GitHub Copilot to Automate Documentation Testing: A Step-by-Step Guide

2026-05-01 15:49:32

Introduction

Documentation is the gateway to your project, especially for open-source tools. When a command fails, an output doesn't match, or a step is unclear, most users won't file a bug report—they'll just leave. This silent drift accumulates as your code evolves, and manual testing can't keep up. The Drasi team, a CNCF sandbox project, faced this exact problem: they shipped code faster than they could manually test tutorials. After a Docker update broke every tutorial, they realized they needed an automated approach. By treating documentation testing as a monitoring problem, they built an AI agent using GitHub Copilot CLI and Dev Containers to act as a synthetic new user. In this guide, you'll learn how to replicate this process for your own project, turning documentation maintenance into an automated, continuous process.

Using GitHub Copilot to Automate Documentation Testing: A Step-by-Step Guide
Source: azure.microsoft.com

What You Need

Step-by-Step Instructions

Step 1: Define the Agent's Behavior

The key is to create an agent that mimics a naïve, literal, and unforgiving new user. Write down three principles for your agent:

Document these rules so your script can enforce them.

Step 2: Set Up a Dev Container with All Dependencies

Create a .devcontainer/devcontainer.json file for your repository. This ensures the testing environment matches your users' setup exactly. Include:

Test the container manually first to confirm it builds and launches correctly.

Step 3: Extract Expected Commands and Outputs from Documentation

Parse your tutorial markdown files to extract each code block or instruction. For each step, note:

You can do this manually for a few tutorials, or write a parser using regex. Save the mapping in a JSON file like tutorial_steps.json.

Step 4: Build the Agent Using GitHub Copilot CLI

GitHub Copilot CLI can be used to generate scripts that simulate user actions. Create a main script (e.g., agent_test.sh) that:

  1. Loops through each step from your extracted JSON.
  2. Runs the command using copilot run <command> or directly in the shell.
  3. Captures the output (stdout and stderr).
  4. Compares the output against the expected result using an assertion function (e.g., assert_output_contains "Success").

Example snippet:

# Inside agent_test.sh
while read -r step; do
    command=$(echo "$step" | jq -r '.command')
    expected=$(echo "$step" | jq -r '.expected_output')
    output=$(eval "$command" 2>&1)
    if echo "$output" | grep -q "$expected"; then
        echo "PASS: $command"
    else
        echo "FAIL: $command"
        exit 1
    fi
done < <(jq -c '.[]' tutorial_steps.json)

Note: For security, avoid eval in production; use proper input validation.

Using GitHub Copilot to Automate Documentation Testing: A Step-by-Step Guide
Source: azure.microsoft.com

Step 5: Integrate with a Continuous Testing Pipeline

Place the agent script inside your Dev Container setup and run it automatically on a schedule or after code changes. Use GitHub Actions or a cron job to:

Example GitHub Action workflow:

name: Test Documentation
on:
  schedule:
    - cron: '0 6 * * *'  # daily
  push:
    paths:
      - 'docs/**'
      - '.devcontainer/**'
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build and run agent
        uses: devcontainers/ci@v0.3
        with:
          runCmd: bash agent_test.sh

Step 6: Handle Silent Failures with Monitoring

Standard CI passes if the script exits with code 0, but silent drift (e.g., a command that succeeds but produces unexpected side effects) requires extra care. Implement these strategies:

Step 7: Iterate and Improve the Agent

Run the agent on your existing tutorials. Fix any failures by updating your documentation or the agent's assumptions. Over time, you'll build a robust test suite. Consider adding:

Tips for Success

By following these steps, you transform documentation testing from a manual, reactive chore into an automated, proactive process. Your synthetic user will tirelessly verify every step, ensuring that your getting-started experience remains smooth even as your code evolves.

Explore

5 Fascinating Facts About Ubuntu's Unusual Codename: Stonking Stingray Top Tech Deals: Massive Savings on Samsung Tablets, Phones, Gaming Gear, and More Unveiling Financial Webs: A Step-by-Step Guide to Analyzing Related-Party Transactions in Corporate Filings GeForce NOW Unleashes May Lineup: 16 Games Including Forza Horizon 6 and 007 First Light Hit Cloud with RTX 5080 Boost Behind the Scenes: Making Documentaries About Open Source Software