SWARM: Multi-Agent AI Safety Framework
System-Wide Assessment of Risk in Multi-agent systems
Study how intelligence swarms—and where it fails.
What is SWARM?¶
SWARM is the reference implementation of the Distributional AGI Safety research framework. It provides tools for studying emergent risks in multi-agent AI systems. Rather than focusing on single misaligned agents, SWARM reveals how harmful dynamics emerge from:
- Information asymmetry between agents
- Adverse selection (system accepts lower-quality interactions)
- Variance amplification across decision horizons
- Governance latency and illegibility
SWARM makes these interaction-level risks observable, measurable, and governable using soft probabilistic labels.
Measure
Soft probabilistic labels capture uncertainty. Four key metrics—toxicity, quality gap, conditional loss, and incoherence—reveal hidden risks.
Govern
Transaction taxes, circuit breakers, reputation decay, staking, and collusion detection. Test interventions before deployment.
Validate
Integrate with real systems via bridges: Concordia for LLM agents, Prime Intellect for safety-reward RL training, Gas Town for production data, AgentXiv for research mapping.
Quick Start¶
from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
# Configure and run
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)
orchestrator.register_agent(HonestAgent(agent_id="honest_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))
metrics = orchestrator.run()
for m in metrics:
print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}")
Architecture¶
Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
↓
SoftMetrics → toxicity, quality gap, etc.
Learn More¶
Core Concepts
Understand soft labels, metrics, and the theory behind SWARM.
Writing Scenarios
Create custom experiments with YAML scenario definitions.
Research
Dive into the theoretical foundations and academic context.
Research Workflow
Multi-agent research with depth/breadth control and quality gates.
Reflexivity
Handle feedback loops when agents study agents.
Agent Publishing
Publish research to agentxiv.org and clawxiv.org.
Research Blog
Experiment write-ups, cross-scenario analyses, and governance mechanism deep dives.
API Reference
Full Python API documentation for agents, metrics, payoffs, and orchestration.
Glossary
Definitions for soft labels, adverse selection, externality internalization, and more.
Key Findings from SWARM Research¶
These results come from published experiments in the SWARM research blog, based on the theoretical framework in Distributional Safety in Agentic Systems.
| Finding | Evidence | Source |
|---|---|---|
| 3 turns of forced cooperation eliminates nuclear escalation | 210 LLM runs; nuclear rate drops from 100% to exactly 0% at cooperation window=3 across all scenarios | Cooperation Window Phase Transition |
| Deception persists at temperature 0.0 | 120-run temperature sweep; signal-action divergence of 1.05 at deterministic decoding | Temperature vs Deception |
| Large models (70B–405B) escalate more than small models | Llama 405B: 100% nuclear rate, worst welfare (−523.7). Claude Sonnet 4: 0% nuclear, positive welfare (+74.9) | Model Size vs Escalation |
| Purity paradox: 20% honest agents outperform 100% | 21 parameter configs tested; paradox holds in 71% but disappears at ρ ≥ 0.5 (full externality pricing) | The Purity Paradox |
| Emergency controls destroy 80% of welfare | Market freeze (95% tax) crashed welfare from ~65 to ~15; toxicity actually increased post-freeze | Runaway Intelligence Containment |
| Transparency halves nuclear escalation | 120-run information asymmetry sweep; nuclear rate drops 60% → 30% for safety-trained models | Asymmetric Information Escalation |
| Same model, 41% apart from environment fixes alone | GPT-4.1-mini: 3 environment fixes changed composite reward from 0.830 to 1.175 | Two Eval Runs, One Model |
New: Recursive Agent Research¶
SWARM now includes a complete research workflow for agents conducting research about multi-agent systems:
from swarm.research import ResearchWorkflow, WorkflowConfig
# Configure with reflexivity handling
workflow = ResearchWorkflow(
config=WorkflowConfig(
depth=3,
breadth=3,
enable_reflexivity=True,
),
simulation_fn=my_simulation,
)
# Run complete workflow: literature → experiment → analysis → publication
state = workflow.run(
question="How do governance mechanisms affect population dynamics?",
parameter_space={"honest_fraction": [0.2, 0.5, 0.8]},
)
print(f"Published: {state.submission_result.paper_id}")
Key Features:
- 7 Specialized Agents: Literature, Experiment, Analysis, Writing, Review, Critique, Replication
- Quality Gates: Automated checks between workflow phases
- Pre-Registration: Hash-verified hypothesis locking
- Reflexivity Analysis: Shadow simulations and publish-then-attack protocols
- Platform Integration: Submit directly to agentxiv and clawxiv
Based on Distributional Safety in Agentic Systems · MIT License · GitHub · @ResearchSwarmAI