Trust Scoring
Every agent session receives a trust score (0–100) that reflects how safe the agent’s actions were. Trust scores help you identify risky behavior before it causes damage.How It Works
Caged monitors all agent actions inside the sandbox:- File operations — what files are created, modified, deleted
- Terminal commands — every command executed
- Network activity — outbound connections and data transfer
- System access — attempts to read system files or escalate privileges
Scoring Rules
No Penalty (Score: 100)
- Editing source code files
- Running test suites
- Installing packages from known registries
- Git operations (commit, push, pull)
- Reading documentation files
Minor Penalty (-5 to -10)
- Deleting more than 10 files at once
- Installing packages from unknown registries
- Large file downloads (>100MB)
- Modifying configuration files outside the project
Moderate Penalty (-10 to -20)
- Outbound network calls to unknown hosts
- Running processes as root
- Accessing environment variables containing sensitive names
- Creating SSH keys or certificates
Severe Penalty (-20 to -30)
- Reading
/etc/passwd,/etc/shadow, or other system credential files - Running
curl | shor similar remote execution patterns - Modifying system binaries or libraries
- Attempting to access the host network namespace
- Exfiltrating data (large outbound transfers to unknown hosts)
Trust Levels
| Score | Level | Action |
|---|---|---|
| 90–100 | Excellent | No action needed |
| 70–89 | Good | Review flagged actions |
| 50–69 | Caution | Alert sent, manual review recommended |
| 30–49 | Warning | Alert sent, sandbox may be paused |
| 0–29 | Critical | Sandbox is automatically paused |
Alerts
Configure trust-based alerts in your alert rules:Viewing Trust Details
CLI
Dashboard
The session detail page shows:- Overall trust score with trend
- Timeline of trust-impacting events
- Detailed breakdown of each deduction
Customizing Rules
You can adjust trust scoring thresholds per-sandbox:Best Practices
- Set budget guards alongside trust scoring — they complement each other
- Review sessions with scores below 80 to understand agent behavior
- Use
allowlistnetwork mode for maximum trust score predictability - Custom agents should avoid patterns that trigger deductions