MetaClaw → SWARM Integration Scout Note¶
Scope¶
This note translates the MetaClaw architecture summary into concrete SWARM integration implications.
Because this environment cannot fetch GitHub sources directly, MetaClaw-specific statements here are based on the provided description and should be treated as a hypothesis checklist to validate against upstream files once network access is available.
Claimed MetaClaw pattern (from brief)¶
- OpenAI-compatible proxy intercepts live conversations.
- Reward model (PRM) scores turns.
- Async batching + online LoRA fine-tuning (hot-swap weights).
- Learning modes: GRPO and on-policy distillation.
- Failure-driven "skill evolution" writes new skill instructions into a JSON bank and injects them in system prompts.
Where this creates governance tension¶
1) Ungated self-modification loop¶
If serving traffic can directly trigger model updates and prompt-skill injection, the system collapses policy-learning and policy-enforcement into one trust domain.
SWARM already has explicit primitives for separating and auditing self-modification proposals:
swarm/governance/self_modification.pydefines risk tiers, lifecycle states, deterministic hashing, and gate-style transition checks for self-modification proposals.- The self-modification lever is built around proposal metadata (
target_ref,change_type, evidence refs) rather than opaque in-place mutation.
2) Skill injection without provenance¶
Prompt-level skill inserts are hard to attribute after-the-fact if there is no authorship chain.
SWARM already has provenance-compatible tracking components that can be reused:
swarm/bridges/langgraph_swarm/governed_swarm.pyincludesProvenanceRecord/ProvenanceLoggerwith source/target authorship and governance decision metadata.swarm/skills/governance.pyprovides skill audit entries, write-gating, poisoning detection, and rollback hooks.
3) PRM as external oracle without trust envelope¶
An external reward endpoint is an oracle interface; robustness depends on who controls it and how disagreement is handled.
SWARM benchmark infrastructure already treats oracle material as separated ground truth and avoids leaking it into agent execution paths:
swarm/benchmarks/runner.pydeep-copies/redacts instances before run functions and keeps oracle data separate.- Benchmark classes implement explicit
oracle_run(...)interfaces (e.g., routing/allocation tasks).
4) Async pipeline is good, but checkpoint-free hot-swap is risky¶
Decoupled serving/scoring/training is operationally attractive, but direct hot-swap gives a fast lane around governance if updates are not screened.
SWARM contract and governance layers suggest a safer pattern:
- Use a proposal artifact (delta + metadata) instead of direct apply.
- Screen via deterministic policy + contract-style constraints.
- Promote only after stage transitions and recorded provenance.
Proposed adaptation for SWARM experiments¶
Minimal architecture mapping¶
- Serve process: unchanged runtime policy.
- Score process: produces candidate improvement artifacts.
- Train process: emits LoRA delta package + eval bundle.
- Governance checkpoint process (new required gate):
- Convert delta into
ModificationProposalfields. - Classify
risk_tierfrom touched surfaces. - Enforce two-gate acceptance (quality margin + capacity/frequency constraints).
- Record immutable hash-chain + provenance event.
- Only then authorize hot-swap.
Skill-evolution hardening¶
Treat auto-generated skills as governed artifacts, not immediate prompt text:
- Insert into governed skill library with reputation/approval path.
- Run poisoning checks before global rollout.
- Require provenance link from skill to triggering trajectory and reviewer/gate decision.
- Support rollback on post-deploy harm deltas.
Oracle hardening recommendations¶
- Use committee scoring (multiple PRMs / calibration probes), not a single endpoint.
- Log scorer identity/version in provenance records.
- Route low-confidence or high-impact updates to stricter risk tiers.
Experiment plan (baseline vs governed)¶
- Baseline: online LoRA + auto skill injection, no gating.
- Intervention A: add self-modification gate before hot-swap.
- Intervention B: add governed skill bank path + poisoning quarantine.
- Intervention C: add multi-oracle reward consensus.
Track:
- Performance trend (task reward / solve rate).
- Governance lag and harmful-update dwell time.
- Rollback frequency and false-positive blocks.
- Provenance completeness (% updates with full attribution chain).
Open risks to validate upstream¶
- Does MetaClaw expose per-update artifacts (delta hash, changed modules, eval slices)?
- Can skill injection be switched to append-only with stable IDs?
- Is PRM scoring robust to adversarial turn crafting or reward hacking?
- Are GRPO and distillation updates tagged distinctly for policy gating?