Skip to main content

Drive Agents Programmatically

Everything the Caged CLI and dashboard do goes through the public API — so you can script the entire loop: create a sandbox with an agent pre-installed, send it prompts, inspect the output, and tear it down. This is the foundation for building agent pipelines, batch jobs, and custom tooling.

Prerequisites

  • A Caged API key (caged_sk_...) — create one in the dashboard or with caged keys create
  • An ANTHROPIC_API_KEY for Claude Code (the agent talks to Anthropic with your key — Caged never proxies or resells tokens)

1. Create a Sandbox with an Agent

Create a Python sandbox that clones your repo, installs Claude Code, and caps spend at $5:
curl -X POST https://api.caged.dev/v1/sandboxes \
  -H "Authorization: Bearer $CAGED_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "template": "python-312",
    "memory_mb": 1024,
    "repo": "https://github.com/your-org/your-project",
    "agents": ["claude"],
    "budget": 5.00,
    "env": {"ANTHROPIC_API_KEY": "sk-ant-..."}
  }'
The create call returns once the repo is cloned and the agent is installed (allow up to a few minutes).
Sandboxes that install agents are automatically provisioned with at least 1024 MB of memory — agent installers need the headroom.

2. Prompt the Agent

Use the exec endpoint to run claude -p (print mode) with any prompt. The command runs inside the VM with the repo at /workspace:
curl -X POST https://api.caged.dev/v1/sandboxes/$SANDBOX_ID/exec \
  -H "Authorization: Bearer $CAGED_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"command": "cd /workspace && claude -p \"Summarize what this repo does in 3 bullets\""}'
Response
{
  "output": "- A Python library for parsing blockchain transactions...\n- ...\n- ...\n",
  "exit_code": 0
}

3. Let the Agent Make Changes

Claude Code can edit files, run tests, and commit — give it permission with --dangerously-skip-permissions (safe here: the blast radius is the disposable VM, which is exactly what Caged is for):
result = caged.sandboxes.exec(
    sandbox.id,
    'cd /workspace && claude -p --dangerously-skip-permissions '
    '"Fix the failing tests and run pytest to verify"',
    timeout=600,  # agent work can take a while
)
print(result.output)
Continue the same conversation across calls with -c:
followup = caged.sandboxes.exec(
    sandbox.id,
    'cd /workspace && claude -c -p "Now add a changelog entry for the fix"',
)

4. Inspect the Results

Read what the agent actually changed — without trusting its summary:
# Diff of all changes in the workspace
diff = caged.files.git_diff(sandbox.id)
print(diff)

# Read a specific file
content = caged.files.read(sandbox.id, "/workspace/CHANGELOG.md")

# Verify tests pass yourself
check = caged.sandboxes.exec(sandbox.id, "cd /workspace && python -m pytest -q")
print("tests pass" if check.ok else f"failed: {check.output}")

5. Handle Failures Properly

Exec distinguishes the command failed from the platform failed:
result = caged.sandboxes.exec(sandbox.id, "cd /workspace && python -m pytest -q")

if result.error:
    # Infrastructure problem: sandbox died, network issue, etc.
    raise RuntimeError(f"sandbox failure: {result.error}")
elif result.exit_code != 0:
    # Command ran and failed — stderr is in output
    print(f"tests failed (exit {result.exit_code}):\n{result.output}")
else:
    print(result.output)

6. Clean Up

# Snapshot first if you want to keep the agent's work
snapshot = caged.snapshots.create(sandbox.id, name="agent-run-1")

caged.sandboxes.destroy(sandbox.id)

Full Script

A complete, runnable batch job — point an agent at a repo, collect its answer, destroy the sandbox:
import os
from caged import Caged

caged = Caged(api_key=os.environ["CAGED_API_KEY"])

sandbox = caged.sandboxes.create(
    template="python-312",
    memory_mb=1024,
    repo="https://github.com/your-org/your-project",
    agents=["claude"],
    budget=5.00,
    env={"ANTHROPIC_API_KEY": os.environ["ANTHROPIC_API_KEY"]},
)

try:
    result = caged.sandboxes.exec(
        sandbox.id,
        'cd /workspace && claude -p "Review this codebase for security issues. '
        'List the top 3 with file and line references."',
        timeout=600,
    )
    if result.ok:
        print(result.output)
    else:
        print(f"agent failed (exit {result.exit_code}): {result.output or result.error}")
finally:
    caged.sandboxes.destroy(sandbox.id)

Interactive Sessions

exec is one-shot and non-interactive. For a live TTY (e.g. the full Claude Code TUI), connect to the terminal WebSocket the dashboard IDE uses:
wss://api.caged.dev/v1/sandboxes/{id}/terminal?rows=40&cols=120&token=caged_sk_...
Send raw keystrokes, receive raw terminal output — any WebSocket client works. Sessions stay open while idle and are recorded for replay.

See Also