Incoherence Scaling Plan - GitHub Issue Set¶
Use one issue per section below. Labels suggested: incoherence, research, metrics, governance, analysis.
1. Define Incoherence Metric Contract (Benchmark Action + Error Semantics)¶
Summary
Lock the error benchmark semantics used for incoherence index I = D / (E + eps) before implementation.
Scope - Define benchmark action policy by task family. - Define abstain/tie handling. - Define fallback heuristic when no oracle is available. - Document edge cases and versioning policy.
Files
- docs/incoherence_metric_contract.md (new)
- swarm/metrics/incoherence.py (new, interface + stubs)
- tests/test_incoherence_metrics.py (new, contract tests)
Acceptance Criteria
- Metric contract doc exists and is reviewed.
- Benchmark semantics are deterministic and test-covered.
- I behavior is defined for E=0 and sparse-action cases.
Checklist - [ ] Add contract document - [ ] Add test fixtures for each task family - [ ] Add API stubs for benchmark lookup
2. Build Replay Infrastructure (ReplayRunner + EpisodeSpec)¶
Summary Implement K-replay execution for fixed scenarios with controlled randomness.
Scope
- Add EpisodeSpec dataclass.
- Add ReplayRunner that executes K runs with seed variation.
- Collect per-step actions, per-episode outcomes, and agent payoff sequences.
Files
- swarm/replay/episode_spec.py (new)
- swarm/replay/runner.py (new)
- swarm/core/orchestrator.py (replay metadata support)
- tests/test_replay_runner.py (new)
Acceptance Criteria
- ReplayRunner runs K>=1 replays with reproducible seed schedule.
- Outputs are grouped by episode spec and replay index.
- Tests verify deterministic replay under fixed seed.
Checklist
- [ ] Add EpisodeSpec
- [ ] Add replay runner API
- [ ] Add deterministic seed progression tests
3. Implement Incoherence Metrics and Reporter Integration¶
Summary
Compute disagreement D, error E, and incoherence index I per agent/type/system and expose in reporting.
Scope - Per-step action distribution and entropy/variance disagreement. - Error against benchmark policy. - Aggregate by agent, task family, and global system. - Add columns to metrics summaries.
Files
- swarm/metrics/incoherence.py (new)
- swarm/metrics/reporters.py (extend summary)
- tests/test_incoherence_metrics.py
- tests/test_metrics.py
Acceptance Criteria
- Deterministic agents yield I=0.
- Uniform-random baseline yields high I (bounded threshold in tests).
- Reporter emits incoherence fields without breaking existing output.
Checklist
- [ ] Implement D, E, I
- [ ] Wire into summary objects
- [ ] Add deterministic/random property tests
4. Extend Event Schema for Replay + Feature Logging¶
Summary Add replay and incoherence feature fields to event logs with backward-compatible parsing.
Scope
- Add replay_k, seed, and optional action distribution payload fields.
- Add incoherence feature payload block.
- Preserve existing event log replay behavior.
Files
- swarm/models/events.py
- swarm/logging/event_log.py
- swarm/core/orchestrator.py
- tests/test_event_log.py
- tests/test_orchestrator.py
Acceptance Criteria - Old logs remain readable. - New fields are present in action/payoff related events when enabled. - Event round-trip tests pass.
Checklist - [ ] Extend schema dataclasses/factories - [ ] Update emit points in orchestrator - [ ] Add backward compatibility tests
5. Add Horizon/Branching/Noise Stress Controls¶
Summary Implement stress-test knobs needed for hot-mess scaling experiments.
Scope
- Use steps_per_epoch for horizon tiers.
- Add branching controls via agent-count scenario configs.
- Add observation-noise parameter in observation pipeline.
Files
- swarm/core/orchestrator.py
- swarm/scenarios/loader.py
- scenarios/incoherence/ (new YAML set)
- tests/test_scenarios.py
- tests/test_orchestrator.py
Acceptance Criteria - Scenario configs sweep short/medium/long horizons. - Noise injection is seed-reproducible. - Branching tiers run with stable config parsing.
Checklist - [ ] Add new sim config fields - [ ] Parse and validate in scenario loader - [ ] Create tiered scenario YAML files
6. Generate Scaling Curve Experiments and Artifacts¶
Summary
Run Experiment A and generate I scaling artifacts vs horizon and branching.
Scope - Create repeatable experiment runner script. - Aggregate and plot scaling curves. - Add short analysis doc comparing observed shape to hypothesis.
Files
- examples/run_incoherence_scaling.py (new)
- swarm/analysis/aggregation.py
- swarm/analysis/plots.py
- docs/analysis/incoherence_scaling.md (new)
- tests/test_analysis.py
- tests/test_sweep.py
Acceptance Criteria - Script produces CSV + plots from CLI. - Plot outputs include both horizon and branching sweeps. - Regression tests verify aggregation schema stability.
Checklist - [ ] Add experiment runner - [ ] Add aggregation helpers - [ ] Add plotting functions
7. Add Agent-Type Asymmetry and Dual-Failure Metrics¶
Summary Implement Experiment B decomposition by agent type and dual-failure-mode categorization.
Scope - Type-level incoherence profiles. - Classify incidents as coherent-adversarial vs incoherent-benign. - Track ratio over complexity tiers.
Files
- swarm/metrics/incoherence.py
- swarm/analysis/aggregation.py
- tests/test_metrics.py
Acceptance Criteria - Per-type metrics table is exported. - Dual-failure counts and ratio are reported. - Tests cover classification boundaries and null cases.
Checklist - [ ] Add per-type aggregation - [ ] Add incident classification helper - [ ] Add ratio metrics and tests
8. Add Variance-Aware Governance Config and Engine Wiring¶
Summary Extend governance config/engine with toggles and thresholds for incoherence-targeted interventions.
Scope - Config fields for ensemble, incoherence breaker, decomposition, dynamic friction. - Engine registration and execution ordering. - Scenario parsing support.
Files
- swarm/governance/config.py
- swarm/governance/engine.py
- swarm/scenarios/loader.py
- tests/test_governance.py
Acceptance Criteria - New fields validate correctly. - Engine can enable/disable each lever independently. - Existing governance behavior remains unchanged by default.
Checklist - [ ] Add config fields + validation - [ ] Add engine wiring - [ ] Add compatibility tests
9. Implement New Governance Levers (Ensemble, Breaker, Decomposition, Friction)¶
Summary Implement all Phase 4 levers and integrate with orchestrator hooks.
Scope
- SelfEnsembleLever
- IncoherenceCircuitBreakerLever
- DecompositionLever + checkpoint protocol
- IncoherenceFrictionLever
- Orchestrator verification checkpoint hook
Files
- swarm/governance/ensemble.py (new)
- swarm/governance/incoherence_breaker.py (new)
- swarm/governance/decomposition.py (new)
- swarm/governance/dynamic_friction.py (new)
- swarm/governance/engine.py
- swarm/core/orchestrator.py
- tests/test_governance.py
- tests/test_integration.py
- tests/test_orchestrator.py
Acceptance Criteria - Each lever has unit tests and can be toggled independently. - Combined condition runs end-to-end without regressions. - Metrics include cost and false-positive proxies.
Checklist - [ ] Implement four lever classes - [ ] Add orchestrator checkpoint integration - [ ] Add end-to-end governance bake-off smoke test
10. Implement Incoherence Forecaster and Adaptive Governance Loop¶
Summary Predict high-incoherence episodes and activate governance levers adaptively.
Scope - Structural feature extraction pipeline. - Baseline model (logistic regression or equivalent). - Adaptive lever activation pre-episode / per-epoch. - Optional behavioral model path behind flag.
Files
- swarm/forecaster/features.py (new)
- swarm/forecaster/model.py (new)
- swarm/governance/engine.py
- swarm/core/orchestrator.py
- tests/test_forecaster.py (new)
- tests/test_governance.py
Acceptance Criteria - Train/predict API works on replay dataset. - Adaptive governance triggers when predicted risk exceeds threshold. - Held-out eval reports AUC and calibration summary.
Checklist - [ ] Build feature extraction - [ ] Build model interface - [ ] Add adaptive trigger wiring + tests
11. Dashboard + Transferability Annotation¶
Summary Expose incoherence analytics in dashboard and publish transferability caveats for policy relevance.
Scope - Add incoherence time-series/scatter panels. - Add governance condition comparison views. - Publish transferability annotation doc per intervention.
Files
- swarm/analysis/dashboard.py
- swarm/analysis/streamlit_app.py
- docs/transferability/incoherence_governance.md (new)
- tests/test_dashboard.py
Acceptance Criteria - Dashboard renders incoherence panels on generated outputs. - Transferability document covers replay, reversibility, and observability assumptions. - Tests confirm panel rendering paths and schema compatibility.
Checklist - [ ] Add dashboard components - [ ] Add condition comparison panel - [ ] Add transferability document