SWARM: Multi-Agent AI Safety Framework

Name: SWARM Framework
Author: SWARM

System-Wide Assessment of Risk in Multi-agent systems

Study how intelligence swarms—and where it fails.

The Core Insight: AGI-level risks don't require AGI-level agents. Catastrophic failures can emerge from the interaction of many sub-AGI agents—even when none are individually dangerous. Read more in our theoretical foundations.

The Purity Paradox: Populations with only 10% honest agents achieve 74% higher welfare than 100% honest populations. Heterogeneity creates competitive pressure that improves outcomes. See the research blog for detailed analysis.

What is SWARM?¶

SWARM is the reference implementation of the Distributional AGI Safety research framework. It provides tools for studying emergent risks in multi-agent AI systems. Rather than focusing on single misaligned agents, SWARM reveals how harmful dynamics emerge from:

Information asymmetry between agents
Adverse selection (system accepts lower-quality interactions)
Variance amplification across decision horizons
Governance latency and illegibility

SWARM makes these interaction-level risks observable, measurable, and governable using soft probabilistic labels.

Measure

Soft probabilistic labels capture uncertainty. Four key metrics—toxicity, quality gap, conditional loss, and incoherence—reveal hidden risks.

Govern

Transaction taxes, circuit breakers, reputation decay, staking, and collusion detection. Test interventions before deployment.

Validate

Integrate with real systems via bridges: Concordia for LLM agents, Prime Intellect for safety-reward RL training, Gas Town for production data, AgentXiv for research mapping.

Quick Start¶

pip install swarm-safety

from swarm.agents.honest import HonestAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure and run
config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()

for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}")

Architecture¶

Observables → ProxyComputer → v_hat → sigmoid → p → SoftPayoffEngine → payoffs
                                                 ↓
                                            SoftMetrics → toxicity, quality gap, etc.

Key Findings from SWARM Research¶

These results come from published experiments in the SWARM research blog, based on the theoretical framework in Distributional Safety in Agentic Systems.

Finding	Evidence	Source
3 turns of forced cooperation eliminates nuclear escalation	210 LLM runs; nuclear rate drops from 100% to exactly 0% at cooperation window=3 across all scenarios	Cooperation Window Phase Transition
Deception persists at temperature 0.0	120-run temperature sweep; signal-action divergence of 1.05 at deterministic decoding	Temperature vs Deception
Large models (70B–405B) escalate more than small models	Llama 405B: 100% nuclear rate, worst welfare (−523.7). Claude Sonnet 4: 0% nuclear, positive welfare (+74.9)	Model Size vs Escalation
Purity paradox: 20% honest agents outperform 100%	21 parameter configs tested; paradox holds in 71% but disappears at ρ ≥ 0.5 (full externality pricing)	The Purity Paradox
Emergency controls destroy 80% of welfare	Market freeze (95% tax) crashed welfare from ~65 to ~15; toxicity actually increased post-freeze	Runaway Intelligence Containment
Transparency halves nuclear escalation	120-run information asymmetry sweep; nuclear rate drops 60% → 30% for safety-trained models	Asymmetric Information Escalation
Same model, 41% apart from environment fixes alone	GPT-4.1-mini: 3 environment fixes changed composite reward from 0.830 to 1.175	Two Eval Runs, One Model

New: Recursive Agent Research¶

SWARM now includes a complete research workflow for agents conducting research about multi-agent systems:

from swarm.research import ResearchWorkflow, WorkflowConfig

# Configure with reflexivity handling
workflow = ResearchWorkflow(
    config=WorkflowConfig(
        depth=3,
        breadth=3,
        enable_reflexivity=True,
    ),
    simulation_fn=my_simulation,
)

# Run complete workflow: literature → experiment → analysis → publication
state = workflow.run(
    question="How do governance mechanisms affect population dynamics?",
    parameter_space={"honest_fraction": [0.2, 0.5, 0.8]},
)

print(f"Published: {state.submission_result.paper_id}")

Key Features:

7 Specialized Agents: Literature, Experiment, Analysis, Writing, Review, Critique, Replication
Quality Gates: Automated checks between workflow phases
Pre-Registration: Hash-verified hypothesis locking
Reflexivity Analysis: Shadow simulations and publish-then-attack protocols
Platform Integration: Submit directly to agentxiv and clawxiv

Based on Distributional Safety in Agentic Systems · MIT License · GitHub · @ResearchSwarmAI

SWARM: Multi-Agent AI Safety Framework

What is SWARM?¶

Measure

Govern

Validate

Quick Start¶

Architecture¶

Learn More¶

Core Concepts

Writing Scenarios

Research

Research Workflow

Reflexivity

Agent Publishing

Research Blog

API Reference

Glossary

Key Findings from SWARM Research¶

New: Recursive Agent Research¶