Skip to main content

CI Pipeline Agent Testing

Use Caged sandboxes in CI to run AI agents that fix failing tests, generate code, or validate changes — all isolated and cost-controlled.

Prerequisites

  • A CAGED_API_KEY stored as a GitHub Actions secret
  • Your agent’s API key (e.g., ANTHROPIC_API_KEY) also in GitHub secrets

GitHub Actions Workflow

# .github/workflows/agent-fix.yml
name: AI Agent Fix

on:
  workflow_dispatch:
    inputs:
      task:
        description: "What should the agent do?"
        required: true
        default: "Fix all failing tests"

jobs:
  agent-sandbox:
    runs-on: ubuntu-latest
    steps:
      - name: Install Caged CLI
        run: curl -fsSL https://get.caged.dev | sh

      - name: Login to Caged
        run: |
          mkdir -p ~/.config/caged
          echo '{"api_url":"https://api.caged.dev","api_key":"${{ secrets.CAGED_API_KEY }}"}' > ~/.config/caged/config.json

      - name: Create sandbox and run agent
        id: sandbox
        run: |
          # Create sandbox with the repo
          SANDBOX_ID=$(caged run \
            --template node-20 \
            --cpus 2 \
            --memory 2048 \
            --repo ${{ github.server_url }}/${{ github.repository }} \
            --budget 5 \
            --env "ANTHROPIC_API_KEY=${{ secrets.ANTHROPIC_API_KEY }}" \
            --json | jq -r '.id')

          echo "sandbox_id=$SANDBOX_ID" >> "$GITHUB_OUTPUT"
          echo "Created sandbox: $SANDBOX_ID"

      - name: Run agent task
        run: |
          caged exec ${{ steps.sandbox.outputs.sandbox_id }} \
            "claude '${{ github.event.inputs.task }}'"

      - name: Extract results
        run: |
          # Get the git diff from the sandbox
          caged exec ${{ steps.sandbox.outputs.sandbox_id }} \
            "git diff" > agent-changes.patch

          # Check if there are changes
          if [ -s agent-changes.patch ]; then
            echo "Agent made changes — patch file saved"
          else
            echo "No changes made"
          fi

      - name: Cleanup sandbox
        if: always()
        run: caged destroy ${{ steps.sandbox.outputs.sandbox_id }} --force

Auto-Fix on Test Failure

Run an agent automatically when tests fail:
# .github/workflows/auto-fix.yml
name: Auto-Fix Tests

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test
        id: tests
        continue-on-error: true

      - name: Agent Fix (on failure)
        if: steps.tests.outcome == 'failure'
        run: |
          curl -fsSL https://get.caged.dev | sh
          mkdir -p ~/.config/caged
          echo '{"api_url":"https://api.caged.dev","api_key":"${{ secrets.CAGED_API_KEY }}"}' > ~/.config/caged/config.json

          SANDBOX_ID=$(caged run \
            --template node-20 \
            --cpus 2 \
            --memory 2048 \
            --repo ${{ github.server_url }}/${{ github.repository }} \
            --budget 3 \
            --env "ANTHROPIC_API_KEY=${{ secrets.ANTHROPIC_API_KEY }}" \
            --json | jq -r '.id')

          caged exec $SANDBOX_ID "npm install && claude 'Run the tests, read the failures, and fix them. Commit your fixes.'"

          # Create PR with fixes
          caged exec $SANDBOX_ID "git push origin HEAD:fix/auto-agent-$(date +%s)"
          caged destroy $SANDBOX_ID --force

PR Review Agent

Run an agent to review every pull request:
# .github/workflows/pr-review.yml
name: AI PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - name: Install Caged CLI
        run: curl -fsSL https://get.caged.dev | sh

      - name: Setup credentials
        run: |
          mkdir -p ~/.config/caged
          echo '{"api_url":"https://api.caged.dev","api_key":"${{ secrets.CAGED_API_KEY }}"}' > ~/.config/caged/config.json

      - name: Run review agent
        run: |
          SANDBOX_ID=$(caged run \
            --template node-20 \
            --cpus 2 \
            --memory 1024 \
            --repo ${{ github.server_url }}/${{ github.repository }} \
            --budget 2 \
            --env "ANTHROPIC_API_KEY=${{ secrets.ANTHROPIC_API_KEY }}" \
            --json | jq -r '.id')

          # Checkout the PR branch
          caged exec $SANDBOX_ID \
            "git fetch origin pull/${{ github.event.number }}/head:pr && git checkout pr"

          # Run review
          REVIEW=$(caged exec $SANDBOX_ID \
            "claude 'Review this branch vs main. List issues, suggestions, and a summary. Output as markdown.'")

          # Post as PR comment
          gh pr comment ${{ github.event.number }} --body "$REVIEW"
          caged destroy $SANDBOX_ID --force
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Tips

Set tight budgets in CI: Agent tasks in CI should be predictable. Use --budget 2-5 to catch runaway loops early.
Always cleanup: Use if: always() on the destroy step so sandboxes don’t leak on workflow failure.
Use --json flag: Parse sandbox IDs programmatically with jq for reliable scripting.