Core Concepts¶
SWARM is built on several key ideas that distinguish it from traditional AI safety approaches.
The Central Thesis¶
AGI-level risks don't require AGI-level agents. Catastrophic failures can emerge from the interaction of many sub-AGI agents—even when none are individually dangerous.
This happens through:
- Information asymmetry - Some agents know things others don't
- Adverse selection - Bad interactions get accepted more often than good ones
- Variance amplification - Small errors compound across decisions
- Governance lag - Safety mechanisms react too slowly
Soft Probabilistic Labels¶
Instead of binary classifications (good/bad, safe/unsafe), SWARM uses soft labels:
Where \(p \in [0, 1]\) represents the probability that an interaction is beneficial.
This captures:
- Uncertainty about outcomes
- Gradations of quality
- Calibration requirements
Learn more about soft labels →
Four Key Metrics¶
| Metric | Formula | What It Reveals |
|---|---|---|
| Toxicity | \(E[1-p \mid \text{accepted}]\) | Expected harm in accepted interactions |
| Quality Gap | \(E[p \mid \text{accepted}] - E[p \mid \text{rejected}]\) | Adverse selection (negative = bad) |
| Conditional Loss | \(E[\pi \mid \text{accepted}] - E[\pi]\) | Selection effects on payoffs |
| Incoherence | \(\text{Var}[\text{decision}] / E[\text{error}]\) | Variance-to-error ratio |
Governance Mechanisms¶
SWARM provides configurable safety interventions:
- Transaction Taxes - Add friction to reduce exploitation
- Reputation Decay - Make past bad behavior costly
- Circuit Breakers - Freeze agents exhibiting toxic patterns
- Random Audits - Deter hidden exploitation
- Staking - Require skin in the game
- Collusion Detection - Identify coordinated attacks
The Emergence Problem¶
Single-agent alignment asks: "How do we align one powerful agent?"
SWARM asks: "What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"
This is the emergence problem: system-level failures that aren't predictable from individual agent properties.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ SWARM CORE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Observables → ProxyComputer → v_hat → sigmoid → p │
│ ↓ │
│ SoftInteraction │
│ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PayoffEngine │ │ Governance │ │ Metrics │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Next Steps¶
-
:material-label-outline:{ .lg .middle } Soft Labels
Understand probabilistic quality assessment
-
:material-chart-line:{ .lg .middle } Metrics
Learn the four key metrics and what they measure
-
:material-shield-check:{ .lg .middle } Governance
Explore safety mechanisms and interventions
-
:material-waves:{ .lg .middle } Emergence
Understand system-level failure modes