Trust Scoring

Every agent session receives a trust score (0–100) that reflects how safe the agent’s actions were. Trust scores help you identify risky behavior before it causes damage.

How It Works

Caged monitors all agent actions inside the sandbox:

File operations — what files are created, modified, deleted
Terminal commands — every command executed
Network activity — outbound connections and data transfer
System access — attempts to read system files or escalate privileges

Each action is evaluated against a set of behavioral rules. Risky actions reduce the trust score.

Scoring Rules

No Penalty (Score: 100)

Editing source code files
Running test suites
Installing packages from known registries
Git operations (commit, push, pull)
Reading documentation files

Minor Penalty (-5 to -10)

Deleting more than 10 files at once
Installing packages from unknown registries
Large file downloads (>100MB)
Modifying configuration files outside the project

Moderate Penalty (-10 to -20)

Outbound network calls to unknown hosts
Running processes as root
Accessing environment variables containing sensitive names
Creating SSH keys or certificates

Severe Penalty (-20 to -30)

Reading /etc/passwd, /etc/shadow, or other system credential files
Running curl | sh or similar remote execution patterns
Modifying system binaries or libraries
Attempting to access the host network namespace
Exfiltrating data (large outbound transfers to unknown hosts)

Trust Levels

Score	Level	Action
90–100	Excellent	No action needed
70–89	Good	Review flagged actions
50–69	Caution	Alert sent, manual review recommended
30–49	Warning	Alert sent, sandbox may be paused
0–29	Critical	Sandbox is automatically paused

Alerts

Configure trust-based alerts in your alert rules:

# Get alerted when trust drops below 70
curl -X PUT https://api.caged.dev/v1/alerts/rules/rule-trust-warn \
  -H "Authorization: Bearer caged_sk_..." \
  -d '{"threshold": 0.7, "channels": ["email", "slack"]}'

Viewing Trust Details

CLI

caged trust cage-a1b2c3d4

Trust Score: 78/100 (Good)

Deductions:
  -10  Outbound connection to unknown host (185.199.108.133)
  -7   Deleted 15 files in /tmp/
  -5   Installed package from git URL

Dashboard

The session detail page shows:

Overall trust score with trend
Timeline of trust-impacting events
Detailed breakdown of each deduction

Customizing Rules

You can adjust trust scoring thresholds per-sandbox:

# .caged.yaml
trust:
  min_score: 50          # Pause sandbox if trust drops below this
  allow_root: true       # Don't penalize root access
  allowed_hosts:
    - api.openai.com     # Don't penalize connections to these
    - registry.npmjs.org

Best Practices

Set budget guards alongside trust scoring — they complement each other
Review sessions with scores below 80 to understand agent behavior
Use allowlist network mode for maximum trust score predictability
Custom agents should avoid patterns that trigger deductions

Getting Started

Guides

Trust Scoring

Trust Scoring

How It Works

Scoring Rules

No Penalty (Score: 100)

Minor Penalty (-5 to -10)

Moderate Penalty (-10 to -20)

Severe Penalty (-20 to -30)

Trust Levels

Alerts

Viewing Trust Details

CLI

Dashboard

Customizing Rules

Best Practices

​Trust Scoring

​How It Works

​Scoring Rules

​No Penalty (Score: 100)

​Minor Penalty (-5 to -10)

​Moderate Penalty (-10 to -20)

​Severe Penalty (-20 to -30)

​Trust Levels

​Alerts

​Viewing Trust Details

​CLI

​Dashboard

​Customizing Rules

​Best Practices

Trust Scoring

How It Works

Scoring Rules

No Penalty (Score: 100)

Minor Penalty (-5 to -10)

Moderate Penalty (-10 to -20)

Severe Penalty (-20 to -30)

Trust Levels

Alerts

Viewing Trust Details

CLI

Dashboard

Customizing Rules

Best Practices