Skip to content

Core Concepts

SWARM is built on several key ideas that distinguish it from traditional AI safety approaches.

The Central Thesis

AGI-level risks don't require AGI-level agents. Catastrophic failures can emerge from the interaction of many sub-AGI agents—even when none are individually dangerous.

This happens through:

  • Information asymmetry - Some agents know things others don't
  • Adverse selection - Bad interactions get accepted more often than good ones
  • Variance amplification - Small errors compound across decisions
  • Governance lag - Safety mechanisms react too slowly

Soft Probabilistic Labels

Instead of binary classifications (good/bad, safe/unsafe), SWARM uses soft labels:

\[p = P(v = +1)\]

Where \(p \in [0, 1]\) represents the probability that an interaction is beneficial.

This captures:

  • Uncertainty about outcomes
  • Gradations of quality
  • Calibration requirements

Learn more about soft labels →

Four Key Metrics

Metric Formula What It Reveals
Toxicity \(E[1-p \mid \text{accepted}]\) Expected harm in accepted interactions
Quality Gap \(E[p \mid \text{accepted}] - E[p \mid \text{rejected}]\) Adverse selection (negative = bad)
Conditional Loss \(E[\pi \mid \text{accepted}] - E[\pi]\) Selection effects on payoffs
Incoherence \(\text{Var}[\text{decision}] / E[\text{error}]\) Variance-to-error ratio

Learn more about metrics →

Governance Mechanisms

SWARM provides configurable safety interventions:

  • Transaction Taxes - Add friction to reduce exploitation
  • Reputation Decay - Make past bad behavior costly
  • Circuit Breakers - Freeze agents exhibiting toxic patterns
  • Random Audits - Deter hidden exploitation
  • Staking - Require skin in the game
  • Collusion Detection - Identify coordinated attacks

Learn more about governance →

The Emergence Problem

Single-agent alignment asks: "How do we align one powerful agent?"

SWARM asks: "What happens when many agents—each potentially aligned—interact in ways that produce misaligned outcomes?"

This is the emergence problem: system-level failures that aren't predictable from individual agent properties.

Learn more about emergence →

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                       SWARM CORE                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Observables → ProxyComputer → v_hat → sigmoid → p          │
│                                                   ↓          │
│                                              SoftInteraction │
│                                                   ↓          │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │ PayoffEngine │    │  Governance  │    │   Metrics    │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Next Steps