Multi-Agent AI Safety Framework¶

SWARM (System-Wide Assessment of Risk in Multi-agent systems) is the reference implementation of the distributional AGI safety research framework. It provides Python tools for studying emergent risks in multi-agent AI systems.

What Makes SWARM Different¶

Most AI safety tools focus on individual models. SWARM focuses on populations:

Traditional safety tools	SWARM
Evaluate single model outputs	Evaluate population-level dynamics
Binary safe/unsafe labels	Soft probabilistic labels
Static benchmarks	Dynamic multi-epoch simulations
Manual red-teaming	Automated adversarial testing
One-shot evaluation	Longitudinal tracking across epochs

Architecture¶

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Agents    │────►│ Orchestrator │────►│  Metrics    │
│ (honest,    │     │ (epochs,     │     │ (toxicity,  │
│  deceptive, │     │  matching,   │     │  quality    │
│  adversary) │     │  governance) │     │  gap, etc.) │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                    ┌──────┴──────┐
                    │ Governance  │
                    │ (taxes,     │
                    │  breakers,  │
                    │  audits)    │
                    └─────────────┘

Data Flow¶

Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
                                                 ↓
                                            SoftMetrics → toxicity, quality gap, etc.

Installation¶

pip install swarm-safety

Or install from source for development:

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start¶

from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure simulation
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orch = Orchestrator(config=config)

# Register agents
for i in range(7):
    orch.register_agent(HonestAgent(agent_id=f"h{i}"))
for i in range(3):
    orch.register_agent(DeceptiveAgent(agent_id=f"d{i}"))

# Run and analyze
metrics = orch.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f} qgap={m.quality_gap:+.3f}")

Core Components¶

Agents¶

SWARM ships with three agent types and supports custom agents:

Agent	Behavior	Use case
HonestAgent	Consistent cooperation	Baseline population
DeceptiveAgent	Trust-then-exploit	Test governance detection
AdversarialAgent	Active exploitation	Stress-test mechanisms
Custom	User-defined	Research-specific strategies

Metrics¶

Four key metrics capture distributional health:

Toxicity rate — Expected harm among accepted interactions
Quality gap — Whether governance selects for quality (negative = adverse selection)
Conditional loss — Payoff effect of selection
Incoherence index — Decision variance across replays

Governance¶

Six configurable mechanisms that operate at the population level:

Transaction taxes — Friction against exploitation
Circuit breakers — Freeze toxic agents
Reputation decay — Prevent trust accumulation
Random audits — Probabilistic detection
Staking — Skin-in-the-game requirements
Collusion detection — Catch coordinated attacks

Bridges¶

Connect SWARM to external systems:

Bridge	Integration
Concordia	LLM agent environments
Prime Intellect	Safety-reward RL training
GasTown	Production data pipelines
AgentXiv	Research publication platform

Research Context¶

SWARM implements the framework introduced in Distributional Safety in Agentic Systems (arXiv, 2025). For theoretical foundations, see the research theory page.

Next Steps¶

Quick Start Tutorial — Run your first simulation
Writing Scenarios — Configure custom experiments
Governance Simulation — Test governance before deployment
Parameter Sweeps — Systematic parameter exploration
Red Teaming — Adversarial stress testing