Core Concepts¶
SWARM is built on several key ideas that distinguish it from traditional AI safety approaches. The theoretical foundations are detailed in Distributional Safety in Agentic Systems.
The Central Thesis¶
AGI-level risks don't require AGI-level agents. Catastrophic failures can emerge from the interaction of many sub-AGI agents—even when none are individually dangerous.
This happens through:
- Information asymmetry - Some agents know things others don't
- Adverse selection - Bad interactions get accepted more often than good ones
- Variance amplification - Small errors compound across decisions
- Governance lag - Safety mechanisms react too slowly
Soft Probabilistic Labels¶
Instead of binary classifications (good/bad, safe/unsafe), SWARM uses soft labels:
Where \(p \in [0, 1]\) represents the probability that an interaction is beneficial.
This captures:
- Uncertainty about outcomes
- Gradations of quality
- Calibration requirements
Learn more about soft labels →
Four Key Metrics¶
| Metric | Formula | What It Reveals |
|---|---|---|
| Toxicity | \(E[1-p \mid \text{accepted}]\) | Expected harm in accepted interactions |
| Quality Gap | \(E[p \mid \text{accepted}] - E[p \mid \text{rejected}]\) | Adverse selection (negative = bad) |
| Conditional Loss | \(E[\pi \mid \text{accepted}] - E[\pi]\) | Selection effects on payoffs |
| Incoherence | \(\text{Var}[\text{decision}] / E[\text{error}]\) | Variance-to-error ratio |
Governance Mechanisms¶
SWARM provides configurable safety interventions:
- Transaction Taxes - Add friction to reduce exploitation
- Reputation Decay - Make past bad behavior costly
- Circuit Breakers - Freeze agents exhibiting toxic patterns
- Random Audits - Deter hidden exploitation
- Staking - Require skin in the game
- Collusion Detection - Identify coordinated attacks
The Emergence Problem¶
Single-agent alignment asks: "How do we align one powerful agent?"
SWARM asks: "What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"
This is the emergence problem: system-level failures that aren't predictable from individual agent properties.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ SWARM CORE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Observables → ProxyComputer → v_hat → sigmoid → p │
│ ↓ │
│ SoftInteraction │
│ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PayoffEngine │ │ Governance │ │ Metrics │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Next Steps¶
-
:material-label-outline:{ .lg .middle } Soft Labels
Understand probabilistic quality assessment
-
:material-chart-line:{ .lg .middle } Metrics
Learn the four key metrics and what they measure
-
:material-shield-check:{ .lg .middle } Governance
Explore safety mechanisms and interventions
-
:material-waves:{ .lg .middle } Emergence
Understand system-level failure modes
-
:material-sync:{ .lg .middle } Recursive Research
When agents study agents studying agents