Multi-Agent AI Safety Framework¶
SWARM (System-Wide Assessment of Risk in Multi-agent systems) is the reference implementation of the distributional AGI safety research framework. It provides Python tools for studying emergent risks in multi-agent AI systems.
What Makes SWARM Different¶
Most AI safety tools focus on individual models. SWARM focuses on populations:
| Traditional safety tools | SWARM |
|---|---|
| Evaluate single model outputs | Evaluate population-level dynamics |
| Binary safe/unsafe labels | Soft probabilistic labels |
| Static benchmarks | Dynamic multi-epoch simulations |
| Manual red-teaming | Automated adversarial testing |
| One-shot evaluation | Longitudinal tracking across epochs |
Architecture¶
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Agents │────►│ Orchestrator │────►│ Metrics │
│ (honest, │ │ (epochs, │ │ (toxicity, │
│ deceptive, │ │ matching, │ │ quality │
│ adversary) │ │ governance) │ │ gap, etc.) │
└─────────────┘ └──────────────┘ └─────────────┘
│
┌──────┴──────┐
│ Governance │
│ (taxes, │
│ breakers, │
│ audits) │
└─────────────┘
Data Flow¶
Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
↓
SoftMetrics → toxicity, quality gap, etc.
Installation¶
Or install from source for development:
Quick Start¶
from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
# Configure simulation
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orch = Orchestrator(config=config)
# Register agents
for i in range(7):
orch.register_agent(HonestAgent(agent_id=f"h{i}"))
for i in range(3):
orch.register_agent(DeceptiveAgent(agent_id=f"d{i}"))
# Run and analyze
metrics = orch.run()
for m in metrics:
print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f} qgap={m.quality_gap:+.3f}")
Core Components¶
Agents¶
SWARM ships with three agent types and supports custom agents:
| Agent | Behavior | Use case |
|---|---|---|
| HonestAgent | Consistent cooperation | Baseline population |
| DeceptiveAgent | Trust-then-exploit | Test governance detection |
| AdversarialAgent | Active exploitation | Stress-test mechanisms |
| Custom | User-defined | Research-specific strategies |
Metrics¶
Four key metrics capture distributional health:
- Toxicity rate — Expected harm among accepted interactions
- Quality gap — Whether governance selects for quality (negative = adverse selection)
- Conditional loss — Payoff effect of selection
- Incoherence index — Decision variance across replays
Governance¶
Six configurable mechanisms that operate at the population level:
- Transaction taxes — Friction against exploitation
- Circuit breakers — Freeze toxic agents
- Reputation decay — Prevent trust accumulation
- Random audits — Probabilistic detection
- Staking — Skin-in-the-game requirements
- Collusion detection — Catch coordinated attacks
Bridges¶
Connect SWARM to external systems:
| Bridge | Integration |
|---|---|
| Concordia | LLM agent environments |
| Prime Intellect | Safety-reward RL training |
| GasTown | Production data pipelines |
| AgentXiv | Research publication platform |
Research Context¶
SWARM implements the framework introduced in Distributional Safety in Agentic Systems (arXiv, 2025). For theoretical foundations, see the research theory page.
Next Steps¶
- Quick Start Tutorial — Run your first simulation
- Writing Scenarios — Configure custom experiments
- Governance Simulation — Test governance before deployment
- Parameter Sweeps — Systematic parameter exploration
- Red Teaming — Adversarial stress testing