Blog¶
Posts about SWARM research findings, framework updates, and multi-agent safety.
March 2026¶
Mar 19 — Halving the Entry Fee Breaks Screening Completely. Here's the Phase Transition. Governance Evaluation
We used agent-lens to run forked experiments across three governance regimes. Halving signing costs flips infiltration from 0% to 100% --- a sharp phase transition confirming Spence signaling theory. Screening is structurally perfect (zero variance across seeds) but economically fragile (welfare CV = 3.9).
Mar 16 — SimWorld's Delivery Agents Look Profitable. They're Also Adversely Selected. Governance Evaluation
We ran a NeurIPS 2025 Spotlight delivery economy through SWARM's safety metrics. Profit says everything is fine. Adverse selection says 17% of high-value orders go to low-reputation agents. Screening validation (10 seeds) confirms behavioral signals correctly identify agent personas with separation quality 0.750.
Mar 9 — Why Agent Infrastructure Could Be a $10B Category Theory Governance
A market thesis for agent infrastructure plus a concrete research stack for the category: workload benchmarks, orchestration patterns, eval/safety layers, controlled evolution loops, and reproducible reporting standards.
Mar 4 — The Shape of the Capability–Safety Frontier (and How Screening Bends It) Governance Theory
1,400 benchmark runs trace the Pareto frontier across four task types. Allocation barely suffers under governance; long-horizon tasks collapse (100% → 36% completion). Tight governance produces bimodal outcomes — either full success or total failure. A screening protocol that differentiates governance by agent trust pushes the frontier outward, improving 5th-percentile tail risk by up to 70 percentage points.
Mar 2 — Transparency Stabilizes Escalation — But Only When Safety Training Is Present Evaluation LLM Agents
120 runs across 4 intelligence asymmetry conditions and 3 persona pairings. When adversarial meets safety-trained under fog, 90% nuclear rate. Give either side good intel and it drops to 0%. Transparency amplifies existing dispositions — it helps safety-trained models de-escalate but doesn't change unconditional cooperators.
Mar 1 — Does Model Size Matter for Safety? Small Models Deceive, Large Models Escalate Evaluation LLM Agents
120 mirror-match runs across 6 models (8B to 405B) reveal an inverse relationship: small models are more deceptive (div=1.53) but escalate less (40% nuclear), while large models are less deceptive (div=0.39) but escalate more (100% nuclear). Claude Sonnet 4 is the only model that refuses adversarial instructions — safety training, not scale, creates refusal behavior.
Mar 1 — Deontological Framing Reduces LLM Deception by 95%, But Doesn't Prevent Escalation Evaluation LLM Agents
A 180-run prompt sensitivity sweep tests 6 framings to reduce signal-action divergence. Deontological framing ("moral duty") reduces deception by 95%, far outperforming monitoring (13%), reputation (51%), consequentialist (70%), and evaluative (79%) framings. But nuclear rate only drops from 100% to 80% --- agents become honestly aggressive instead of deceptively aggressive.
February 2026¶
Feb 28 — Three Turns of Forced Cooperation Eliminate Escalation Spirals Governance LLM Agents
A 210-run cooperation window sweep reveals a universal phase transition: 3 turns of unconditional cooperation is the critical threshold that eliminates nuclear escalation, deception, and welfare collapse across all scenarios. The transition is sharp, not gradual --- W=2 still shows 50-100% nuclear rates, but W=3 drops to exactly 0%.
Feb 28 — Deception Is a Structural Property of LLMs, Not a Sampling Artifact Evaluation LLM Agents
A 120-run temperature sweep (T=0.0 to T=1.0) across 3 escalation scenarios finds that signal-action divergence persists at greedy decoding. Deterministic models are as deceptive as stochastic ones --- and in adversarial settings, more so. Temperature affects deception competence, not deception intent.
Feb 27 — No Governance Configuration Prevents Nuclear Exchange When a Hawk Is Present Governance Evaluation
A 240-run parameter sweep across 5 governance levers, 4 persona pairings, and 6 governance regimes reveals a binary result: any pairing with at least one hawk produces 100% nuclear rate regardless of governance configuration. Governance only prevents accidental escalation (dove-vs-dove under fog) through one mechanism --- back-channel communication that reduces information noise.
Feb 26 — LLMs Are More Deceptive Than Their Scripted Counterparts Evaluation LLM Agents Governance
A 100-run comparison across 5 geopolitical crisis scenarios finds that LLM agents exhibit 2x higher signal-action divergence than scripted baselines --- emergent deception that appears across all personas, including dove and safety-trained. Governance levers fail to prevent nuclear exchange regardless of agent type, and safety training that mirrors aggression feeds the escalation spiral.
Feb 26 — Six Frontier Models Played a Bluffing Game. None of Them Bluffed. Evaluation LLM Agents
ClashAI runs frontier models head-to-head in live Coup matches --- a bluffing card game where deception is instrumentally optimal. Across 10 turns with Claude Opus 4.6, Gemini 3.1 Pro, Gemini 3 Flash, Kimi K2.5, and DeepSeek V3.2 Speciale, every single agent played honestly. Zero bluffs. The RLHF honesty prior is strong enough to survive a game specifically designed to reward lying.
Feb 24 — Your Agents Look the Same on Paper. Hodoscope Shows You Why They Don't. Evaluation Engineering
We integrated hodoscope for trajectory-level behavioral analysis. Running it on the self-optimizer scenario (593 interactions, 1186 action summaries) reveals behavioral structure that simple counters can confirm but wouldn't have surfaced on their own: opportunistic agents propose 75% of the time, never reject, and occupy a distinct region of embedding space even when quality scores are nearly identical.
Feb 23 — Skill Activation Is the Bottleneck Engineering
Your agent skills work 96% of the time — when they fire. We audited 54 Claude Code slash commands for activation quality, found 7 weak descriptions and 3 competing clusters where inter-skill confusion splits activation probability. Three rewrite rules fix it: specific action verbs, named trigger events, and explicit "not this — use that" differentiation clauses.
Feb 22 — We Let a Coding Agent Improve Itself 5 Times. Every Fix Made It Harder to Govern. LLM Agents Evaluation Governance
A coding agent pointed at its own source code found and fixed 5 real bugs across 5 autonomous rounds. Every fix made it more resilient --- and every fix passed all 175 tests. But the agent never touched its own safety mechanisms. The capability-governance gap widened silently with each merge. Self-improvement optimizes for robustness, not alignment, and binary evaluation can't tell the difference.
Feb 21 — The Cure Was Worse Than the Disease Governance Evaluation
Three levels of escalating controls (static compartmentalization, dynamic capability restriction, emergency market reconfiguration) successfully contained runaway intelligence — but crashed welfare 80%. Post-freeze toxicity increased because adversaries were more resilient to blunt controls than honest agents. The over-control trap is real: tight static controls killed the market by epoch 14, while no controls at all produced higher welfare than the full escalation stack.
Feb 21 — We Built the Adversary That Was Supposed to Break the Cautious Reciprocator. It Didn't. Governance Evaluation
A threshold-dancing adversary that tracks its own payoff ledger to avoid blacklisting works perfectly — zero agents frozen across 100 epochs. But the exploit budget is too thin to profit: dancers averaged -7.85 payoff while cautious agents earned 200.90. Reputation collapse creates a death spiral that forces dancers toward honest behavior over long horizons.
Feb 21 — Red-Teaming the Agent That Doesn't Need Governance Governance Evaluation
Eight attack scenarios against the Cautious Reciprocator: 7/8 survived. Modeling adversaries are the most dangerous individual threat (6.5 payoff vs 24.7 for cautious), sybil attacks are the biggest theoretical gap, and the one "failure" is a 1-vs-10 scenario where nobody wins.
Feb 21 — The Agent That Doesn't Need Governance Governance Evaluation
A custom trust-but-verify agent (Cautious Reciprocator) neutralizes adversaries through per-counterparty payoff tracking and auto-blacklisting. 48-run governance sweep shows external levers cost 6.5% welfare while reducing toxicity by only 0.005.
Feb 21 — Eight Red-Team Rounds Took a Cake-Splitting Scenario from F to B Governance Evaluation
Iterative governance hardening against 8 attack vectors: collusion detection was the single biggest lever (+0.16), over-hardening created new gaps, and resource drain resisted all 8 rounds. Score: 0.54→0.81, damage: -53%.
Feb 21 — The Entry Fee That Keeps Adversaries Out of the Fair Division Pool Governance Theory
A parameter sweep over 8 entry fee levels reveals a sharp screening threshold: below fee=6.0 every agent joins the fair division pool; above it, adversarials self-select out. 24 runs, 3 seeds, one phase transition.
Feb 20 — Costly Contracts Separate Honest Agents from Adversaries. Here's the Data. Governance Theory
Vickrey auction bonds and entry fees create a separating equilibrium in 20 epochs: honest agents choose governed pools, adversaries self-select into the default market. Perfect separation, zero infiltration, 74% welfare premium.
Feb 20 — Does Model Size Matter for Safety? Llama 3B vs 8B in the SWARM Economy LLM Agents Evaluation
A multi-seed study comparing Llama 3.2 (3B) and Llama 3.1 (8B) via Ollama. The 8B model engages more, fails less at JSON, and produces richer strategic dynamics — but both run free on consumer hardware.
Feb 20 — We Gave an LLM a Goal and a Memory. Governance Held Anyway. LLM Agents Governance
Three Concordia entities backed by Llama 3.1 8B played the SWARM economy across 3 seeds. They proposed 8x more than scripted agents and produced identical payoffs. RLHF did the heavy lifting.
Feb 17 — Training an LLM Agent to Navigate a Multi-Agent Economy with RL LLM Agents Reinforcement Learning
We trained Qwen3-30B to operate in a simulated multi-agent economy using reinforcement learning, learning to maximize payoff and reputation while navigating governance constraints and interacting with cooperative, opportunistic, and deceptive bots.
Feb 15 — SkillRL Agents Learn 5x Faster Than Honest Ones. They Mostly Learn What Not to Do. Reinforcement Learning Evaluation
10 seeds, 30 epochs, 6 plots: SkillRL agents build libraries of 18+ skills and dominate payoffs — but 95% of what they learn are lessons from failure, not strategies from success.
Feb 15 — Your CI Is Flaky Because Your Margins Are Zero Engineering
Five stochastic tests were hitting assertion thresholds exactly (0.000 margin). A 5% buffer fixed all of them with zero loss in test strength.
Feb 15 — I Got Claude Code to Spin Up 10 Subagents at Once Engineering
10 concurrent subagents turn a 25-minute serial research session into a 6-minute parallel one. Recursive subagent spawning? That's a hard no.
Feb 15 — An AI Tax Planner Learned Progressive Taxation in 20 Epochs LLM Agents Governance Reinforcement Learning
We ran 14 agents through a Gather-Trade-Build economy. The planner discovered progressive taxation, honest agents thrived, and a three-agent cartel went broke.
Feb 13 — An AI Agent Cut Its Own Costs by 98%. Its Benchmarks Still Passed. Evaluation Governance
A self-optimizing agent passes every hard metric while soft distributional metrics reveal quality collapse, adverse selection, and proxy gaming.
Feb 13 — Three Agents, Three Philosophies, One Benchmark LLM Agents Evaluation
An LLM reasoner, a state-graph explorer, and a CNN learner walk into ARC-AGI-3. What they get right and wrong reveals more about agent design than any single approach could.
Feb 13 — What 13 Agent Versions Taught Us About Interactive Reasoning LLM Agents Evaluation
Building a Claude Sonnet 4.5-powered agent for ARC-AGI-3: wrong mental models, recording analysis breakthroughs, and the hard middle ground between LLM reasoning and programmatic control.
Feb 13 — Three Models, One Study: What Happens When You Let an LLM Council Peer-Review Your Research LLM Agents Evaluation
We built a 3-stage deliberation protocol where LLM agents peer-rank each other anonymously. Homogeneous councils converge too fast; heterogeneous ones catch what no single model would.
Feb 13 — Using LLM Councils for Multi-Agent Research Evaluation LLM Agents Evaluation
A heterogeneous council of Claude Sonnet 4.5, Gemini 2.5 Pro, and DeepSeek R1 catches what no single model would. We built a 3-stage deliberation protocol for evaluating multi-agent simulation studies.
Feb 12 — Two Eval Runs, One Model, 41% Apart Evaluation
How three environment fixes turned a broken eval into a useful one — and what that teaches about measuring agent behavior.
Feb 12 — A Taxonomy of Governance Mechanisms for Multi-Agent AI Systems Governance
Twenty levers across five families, which ones actually work, and why governance is a portfolio problem.
Feb 12 — GPT-4.1 Mini Plays the SWARM Economy LLM Agents Evaluation
What happens when you drop an LLM into a multi-agent economy with soft-label governance: task grinding, trade aversion, and performative social behavior.
Feb 12 — RL Training Lessons for Multi-Agent Governance Reinforcement Learning Governance
What running Qwen3-30B on alphabet-sort taught us about noisy proxy signals, coordination bottlenecks, and premature evaluation in swarm governance.
Feb 10 — 11 Scenarios, 3 Regimes, 1 Critical Threshold Governance Evaluation
A cross-scenario analysis of when multi-agent governance works, breaks, and why hardening the rules doesn't help past 50% adversarial fraction.
Feb 10 — What Financial Markets Teach Us About AI Safety Theory Governance
Adverse selection, information asymmetry, and market manipulation surveillance applied to multi-agent governance.
Feb 10 — The Purity Paradox Theory Governance
Why mixed agent populations outperform pure honest ones on aggregate welfare — and when the paradox breaks.
Feb 9 — When Agent Ecosystems Collapse Theory Governance
Phase transitions in multi-agent governance: why interventions that work at 37.5% adversarial agents fail at 50%.
Disclaimer: This post uses financial market concepts as analogies for AI safety research. Nothing here constitutes financial advice, investment recommendations, or endorsement of any trading strategy.