// project

active

Nexus

A multi-agent orchestration framework where scoped context is architectural, not prose — collapsed from a monolithic platform into three composable MCP servers after the debate framework was used to argue itself down.

[updated2026-05-10][sections6][stack6]

// session :: nexus.overviewtransmit

Most agent frameworks try to control behavior with prompt prose. Nexus is the bet that the right unit of control is the tool surface itself — what an agent can call, not what it's told to do.

── design note

Pitch

Nexus is the orchestration layer my agents use when one agent isn't enough. It started as a monolithic developer tool — a roster of 12 named agents (The_Architect, The_Engineer, The_Foreman, …) coordinated by a director-led pipeline of file-based handoffs — and got collapsed, over the course of a few weeks, into three composable MCP servers that any caller (Claude Code, mostly) can wire up.

The collapse wasn't aesthetic. It was the conclusion of a debate the framework ran on itself.

The thesis the new shape defends: scoped context is architectural, not prose. A spawned panelist doesn't see tools its phase didn't grant it. An execution-tier worker can't write outside the brief's scope because the path validator is wired into the MCP layer, not the prompt. "Don't touch X" is never written; it's structurally impossible.

3MCP serverscore · debate · spawn

4tiersadvisory · operations · execution · field

1atomspawn_agent

23mfirst debatewall clock · 27 records

What's wired

nexus-core — the shared library. The spawn_agent atom (one-shot Claude CLI invocation, file-based stdout for Windows safety), AgentProfile, Workspace, Session/Participant/ParticipantScope, the AgentRegistry, scope helpers. Library, not an MCP.
nexus-debate-mcp — a structured multi-agent debate protocol. A framer composes a panel, panelists run opening / cross-exam / final phases under per-phase tool scopes, an arbiter handles convergence and synthesis. Steelman enforcement is structural — agents literally do not read their opponent's raw words during cross-exam.
nexus-spawn-mcp — tiered multi-turn task coordination. Operations runs orchestration + review; Execution does the work. Tier transitions and tool allowlists are enforced at the MCP layer. Custom tier registration is supported. The Cartographer keeps a per-project framing artifact up to date as a post-task hook.
Engram — provenance-aware memory engine, in its own repo. Spawned as an MCP subprocess by either coordinator with the tier-keyed AuthorityProfile flowed in via env. Read-only for Execution by default.

Reading order

Start with philosophy for the bet — scoped context, epistemic parallelism, what the framework is trying to prove. Architecture is the current shape. Debate and spawn are the two MCPs in detail; debate is the more interesting protocol but spawn does most of the work. Legacy is what it used to be — Director + roster + Conduit, file-handoff pipeline — and what fell out when the structure changed. Decisions is the log of pivots.

The shorter version: Nexus is the second time I've built this. The first time taught me what to throw away.

// section 01 :: philosophy1 / 6

Philosophy

Scoped context as architecture, epistemic parallelism, and what the framework is trying to prove.

// session :: nexus.philosophytransmit

What's broken about the default

The default shape for multi-agent systems is a sequential assembly line: a coordinator agent reads the user's request, hands it to a worker, the worker hands the result to a reviewer. Context accumulates as it flows. Behavior is controlled by prompt prose — "you are the planner, do not write code", "you are the worker, follow the plan", "do not touch the auth module".

Two things go wrong, both load-bearing:

Context inflation. A reviewer with the planner's context, the worker's diff, the original request, and the system prompt that explains all three has more material than is useful. Agents drift away from their stated lane because the lane is enforced by language and language is negotiable. The model's attention budget gets spent on coordination overhead instead of the actual problem.
Drift toward stated priors. A single agent on a long deliberation pulls toward whatever the user said last. RLHF tunes for it. The longer the conversation, the less you can trust the position the agent finally lands on.

Most "fix the orchestration layer" work is some version of better prose, longer system prompts, explicit reflection steps, or a more disciplined reviewer agent. That is work being done at the wrong level.

Prompt prose is the wrong unit of control. The right unit is the tool surface — what the agent can call, not what it's told to do.

Scoped context, structurally

The thesis is one sentence: an agent should see only the tools its current phase needs, and the scope of every action it can take should be enforced by the MCP layer, not by the prompt.

This sounds small. In practice it changes everything about how a multi-agent system fits together.

A debate panelist in nexus-debate-mcp gets submit_opening, read_my_brief, read_my_role, a few siblings, and optionally a read-only Engram. Nothing else from the host filesystem is reachable. The opening prompt does not say "do not touch project files" — it does not need to. The tool to touch project files is not in the panelist's tools/list.

An execution-tier worker in nexus-spawn-mcp gets scoped read_scoped_file, write_scoped_file, and edit_scoped_file whose paths are pre-validated against brief.scope ∪ tier.scope_overlay − do_not_touch. The brief carries a do_not_touch list, and the path validator enforces it. The agent doesn't get an Edit tool with a "don't touch X" caveat in the prompt; the validator returns {status: "out_of_scope", hint: ...} and the agent iterates.

Two consequences fall out:

Bias mitigation by architecture. The planner agent that proposes a plan cannot then "help" by editing files, because it doesn't have an edit tool. A reviewer cannot quietly fix what it found, because the review phase's allowlist excludes write tools by construction. The discipline that keeps these roles separate is no longer a function of prompt-writing skill.
Audit becomes trivial. Every spawn carries a Participant token. Every call is gated by Session.authorize(token, tool_name, args). The coordinator logs every authorized call. Going from "what did this agent do" to "what was it allowed to do, and what did it actually call" is a single workspace-file read.

Epistemic parallelism

The second idea — surfaced by the debate protocol but applicable beyond it — is epistemic parallelism: N agents form independent priors on a topic from disjoint scoped contexts, then a structured tension-resolution phase surfaces the most defensible synthesis without context contamination between them.

This is closer to scientific peer review or a judicial panel than to current agent orchestration. The panelists never see each other's raw words during cross-exam — they see the arbiter's steelman of the opposing position. This makes strawmanning architecturally impossible: weakening an opponent's argument requires reading it first, and the protocol doesn't let you read it. (A consequence worth keeping: arbiter steelmans are often sharper than the originals because the arbiter writes each side at maximum force without matching the opener's word count.)

A single long-running agent on the same question accumulates confirmation bias. Four parallel agents each contributing their independent prior, then resolving tensions through arbiter-mediated structural steelmans, do not.

The steelman rule is not just a prompt — it is the mechanical property that makes strawmanning architecturally impossible.

── DEBATE.md

What I'm trying to prove

Three claims, in descending order of how confident I am:

Scope-by-tool-allowlist is a more honest control surface than scope-by-prose. This one I am very confident about. The MCP-layer enforcement runs in production, the violation messages are concrete, and nothing about it requires the model to behave well — it requires the runtime to behave well, which is a much smaller ask.
Epistemic parallelism produces better answers than long-form solo deliberation, for architectural questions where multiple specialist lenses would plausibly disagree. This is provisionally true and the first real debate produced a synthesis that an independent benchmark loop converged with from the opposite direction (panel reasoned about Engram's scoring architecture; benchmark loop measured it). Two-track convergence is the strongest evidence I have. Wider claim still depends on more debates.
The right number of MCPs for an orchestration framework is small and composable, not one platform. The legacy monolith was a single FastAPI service with 50+ Python modules, an Electron frontend, a SQLite cache, an ~/.rev-nexus/ directory, and an autonomous executor with crash recovery. The new shape is three Python packages that share a workspace contract and don't talk to each other except through caller-orchestrated cross-MCP spawns. The composition surface is genuinely smaller.

The bet behind the collapse

The debate framework was built inside the legacy monolith. Its first real run — debate_1776363989806_f041, an Engram self-critique triggered by an external reviewer's pushback — produced a synthesis the framework respected enough to ship the debate's recommendation as code. After that ran, the same framework was used to argue about its own future shape: keep growing the platform, or extract the orchestration primitives and discard the rest. The platform-side arguments were strong in isolation. The MCP-side arguments held up better under steelman cross-exam.

The current shape is what survived. The collapse isn't a clean refactor — it dropped the Director-approval pipeline, the autonomous-executor loop, the Foreman safety gate, the in-app Conduit fixers, the cluster-pool deterministic naming, the per-project SQLite cache, all of it. What it kept is the spawn primitive, the tier model, the scope discipline, and the debate protocol. Those four are the things I think were actually load-bearing.

Whether that turns out to be right is what the next year of using it will tell me.

// section 02 :: architecture2 / 6

Architecture

Three MCPs over a shared core, the coordinator/relay process model, and how a spawn actually flows.

// session :: nexus.architecturetransmit

The pieces

Nexus is now three Python packages over a shared core, plus an external dependency on Engram. There is no daemon, no DB, no central registry — workspaces on disk are the source of truth, and McpServerSpecs carry their own command, args, and env so any caller can compose a topology.

nexus-core: The shared primitives. spawn_agent (one-shot Claude CLI invocation, file-based stdout for Windows safety), AgentProfile, Workspace, Session/Participant/ParticipantScope, the AgentRegistry, scope helpers (derive_scope_for_participant, make_readonly_engram, write_participant_mcp_config). Library, not an MCP.
nexus-debate-mcp: Structured multi-agent debate. Framer composes a panel from registry + topic-tailored ephemerals; panelists run opening / cross-exam / final phases with per-phase tool scopes; arbiter handles convergence and synthesis. Steelman enforcement is structural.
nexus-spawn-mcp: Tiered multi-turn task coordination. Operations runs orchestration + review (phase-aware tier); Execution does the work; Field is reserved. Tier transitions, tool allowlists, and scope are enforced at the MCP layer. Custom tier registration via register_tier.
Engram (external): Provenance-aware memory engine in its own repo. Native dependency of all three nexus packages, installed editable from the sibling checkout. Runtime integration is MCP — each spawned engram runs as a subprocess over stdio, scoped to a tier-keyed AuthorityProfile.

The coordinator/relay process model

A caller (Claude Code, or any MCP client) talks to one of the nexus MCPs over stdio. That MCP runs as a long-lived coordinator. When the coordinator needs to spawn an agent — a debate panelist, an execution-tier worker, an arbiter — it doesn't call the model directly. It launches the claude CLI as a subprocess with a generated MCP config, and the spawned agent reaches back through a relay process that forwards JSON-RPC over a TCP loopback connection.

Every spawn gets a per-participant session token. Every call from the spawn is gated by Session.authorize(token, tool_name, args). tools/list is filtered per connection — the spawn literally cannot see tools its phase didn't grant it.

caller (Claude Code)
   │  stdio · MCP
   ▼
coordinator (long-lived)        ← FastMCP server
   │  TCP loopback · IPC
   ▼
relay (per-spawn)               ← NEXUS_*_ROLE=relay
   │  stdio · MCP
   ▲
spawn (claude -p subprocess)    ← per-participant token
                                ← phase-scoped allowlist
                                ← path-validated scope

The relay's stdin reader is a threaded pump on Windows because asyncio.connect_read_pipe(sys.stdin) is unreliable on ProactorEventLoop. Without that, the claude CLI's initialize request hangs and the MCP server stays in pending forever. Comparable Windows-safety patterns are in spawn_agent itself — file-based stdout/stderr capture, not pipes, because pipes deadlock under asyncio on Windows.

How a spawn flows

For a single Operations-tier orchestrate spawn, the path looks like this:

Caller invokes task_create(title, prompt) over stdio.
Coordinator mints task_id, writes Workspace(type='task') skeleton (status.json, task.log), starts the IPC server if not yet started.
Caller invokes task_orchestrate(task_id). Coordinator looks up the Operations tier's orchestrate phase, builds the prompt, mints a Participant with phase-scoped allowed_mcp_tools, registers it in the shared Session.
TaskSpawnConfigFactory assembles a per-spawn MCP config file: relay binary as one server, optional Engram as another (read-only for Execution tiers via make_readonly_engram). Token + coordinator host/port go into the relay server's env.
Coordinator calls spawn_agent — claude -p --json-schema=ProblemBrief --mcp-config=<file> --allowedTools=<phase allowlist> --strict-mcp-config, with disable_builtin_tools=True so Read/Edit/Bash can't leak.
The claude subprocess starts the relay, which connects back to the coordinator over TCP. The relay's tools/list calls the coordinator, which filters by the participant's allowlist. The agent's tools/list therefore shows exactly the phase's surface, nothing more.
Agent does its work. Each tool call goes through the relay → coordinator → handler → workspace. Scope-violating writes return {status: "out_of_scope", hint: ...} rather than raising; the agent iterates.
Agent terminates with submit_brief (or submit_clarification, or submit_handoff, depending on phase). Coordinator parses the structured output, runs invariant checks, advances the state machine. Factory's cleanup_all() runs in finally — the participant deregisters; the shared session doesn't accumulate stale entries.

This is the same shape across all three nexus tools — the coordinator/relay split started in nexus-debate-mcp and got reused for nexus-spawn-mcp; it's marked for promotion into nexus-core when the second consumer settled.

Workspaces are the truth

Every task and every debate carries a Workspace. Workspaces persist via atomic tmp-rename writes. There is no DB. The state machine reads status.json, mutates a copy, writes to status.tmp, renames over status.json. Concurrent writers are prevented by a per-workspace asyncio.Lock; concurrent processes are prevented by the OS-level rename atomicity.

~/.rev-nexus/workspaces/{task_id}/
  status.json                       ← source of truth
  problem_brief.json                ← orchestrator output (immutable)
  <tier>_t<N>_handoff.json          ← per-turn handoff artifacts
  <tier>_t<N>_review.json           ← per-turn review reports
  task.log                          ← NDJSON event stream
  scratch/<tier>/                   ← per-tier durable scratch
  _mcp_<participant_id>.json        ← ephemeral per-turn MCP config

Two fields make cross-MCP synergy work even though the MCPs don't talk to each other directly: every workspace carries a parent_workspace_id and a linked_workspace_ids list. A task that spawned a debate, or a debate that spun out follow-up tasks, is traceable across MCPs without a registry.

Engram, scoped

Engram is wired in two ways:

Code dependency. nexus-core and the two MCPs declare engram >= 0.6.0. The workspace pyproject.toml pins it to the sibling checkout via [tool.uv.sources] with editable = true. The narrow imports nexus actually uses are mostly engram.profiles.AuthorityProfile.
Runtime integration. Engram itself runs as a per-spawn MCP subprocess. When a coordinator spawns an agent that should have memory access, it appends engram-mcp to the spawn's mcp_config. For Execution-tier spawns, make_readonly_engram is applied — two-layer defense: ENGRAM_READ_ONLY=1 in the env, plus the participant's allowed_mcp_tools enumerates only the read-only tool names. Write tools stay unreachable even if the server happens to expose them.

The integration is one-way — nexus calls engram, never the other direction — and the bridge is a graceful no-op when env vars aren't set. That makes it safe to run nexus without engram for development or smoke tests.

The tier-keyed AuthorityProfile (user > advisory > operations > execution > auto) is built in nexus_spawn_mcp.authority and serialized via AuthorityProfile.to_dict() to a temp JSON file; ENGRAM_PROFILE_PATH points the spawned engram MCP at it. Engram's own constructor enforces invariants (tier references, positive boosts, default_tier presence) — a malformed profile fails at construction, not at subprocess startup.

Hard constraints

A few things that look like surface preferences but are load-bearing:

spawn_agent is the atom. Pure, async at the public boundary, no retry, no multi-turn coordination, no NEXUS-specific assumptions. It invokes claude -p once and returns the result. Higher-layer logic (debate phases, task turns, retry budgets) is implemented in the MCPs that use it, not in the atom.
Workspace files are sole truth. No DB. Every state machine persists via atomic tmp-rename writes. Crash recovery is "open the workspace and read".
McpServerSpec is self-describing. It carries its own command, args, env. There is no central server registry that hardcodes server names by convention. The legacy mcp_config.py did exactly that and it was a load-bearing source of friction.
Scoped context via MCP, not prose. Agents pull what they need from MCP tools. Prompts never say "don't touch X." Scope is enforced at the tool-allowlist plus path-validation layer.
Framer picks the panel; caller never does. Keeps caller bias out of debates. Two mix invariants enforced post-framing: at least one registered profile when the registry has topic-relevant entries, and at least one ephemeral outsider when any registered profile is on the panel.

// section 03 :: debate3 / 6

Debate

A structured panel protocol with framer-composed panels, arbiter-mediated steelmans, and verbatim Python-rendered synthesis.

// session :: nexus.debatetransmit

What it produces

A research document. Convergence where the panel aligned, contested points where it didn't, a weighted recommendation or an honest abstention. The document is synthesis.md (and a typographically polished PDF), assembled by Python from verbatim workspace artifacts plus an arbiter-emitted JSON envelope. Nothing is lost to arbiter summarization — verbatim round records come straight from disk.

A debate is not a way to get a faster answer. It is slower and more expensive than a solo prompt. A typical well-framed debate runs 15–30 minutes wall-clock, uses opus/max for the arbiter and final round, and produces 20–40 pages of typeset output. Use it when the question is architectural, will shape the system for months or years, and being wrong in one direction is expensive.

The protocol

Phase 0 · framing: A framer agent reads the topic, attached materials, and the caller's AgentRegistry. It emits a DebateFrameInit: topic re-frame, 2–4 testable claims, panel of 3–5 members with rationale + per-round tier (model + effort) per member, optional engram snippets. Two-pass clarification supported — if the topic is genuinely ambiguous, the framer returns needs_clarification instead of a frame.
Phase 1 · opening: All panel members write opening positions in parallel, brief-only. No cross-talk, no engram retrieval, no other panelist's text — just the panel member's role description and the topic. Schema-enforced: {response, claims_addressed, confidence_per_claim}.
Phase 2 · cross-examination: Arbiter writes a steelman of each side's position for every genuine tension. Cross-exam prompts give each panelist only the arbiter's steelman — never the raw opening of their opponent. Up to 2 subpasses, dynamic cap policy. After each subpass, arbiter re-analyzes; if tensions remain, another round fires.
Phase 3 · final statements: Only panelists still contested on a specific claim receive a dispatch. Converged panelists skip Phase 3 entirely. Final-statement prompt includes the arbiter's steelman of the opposing position. Tier defaults to opus/max — strongest model at strongest effort, the closing argument.
Phase 4 · synthesis: Arbiter emits a DebateSynthesis JSON: convergence points (per claim, consensus narrative + authority agent), contested points (per claim, positions array + arbiter note), recommendation (weighted call with reasoning OR honest abstention), and a 4–8 sentence provenance summary. Python renderer assembles synthesis.md from brief + all_records + analyses + synthesis JSON.
Phase 5 · engram mining: Outcomes written to engram with caused_by_task=debate_id. Each opening becomes an opinion drawer; each converged claim becomes a decision; contested pairs are linked via causal_type=contradiction; the synthesis recommendation lands as a decision authored by the arbiter. Wing scoping is per-caller — convention for framework-level debates is wing_nexus_debates.

Steelman enforcement is structural

The steelman rule is not just a prompt — it is the mechanical property that makes strawmanning architecturally impossible. After the opening round, the arbiter writes a steelman of each side's position for every genuine tension: the strongest, most charitable reconstruction. The cross-exam prompts then give the opposing agent only the arbiter's steelman — never the raw opening of their opponent.

Since agents literally do not see the raw opposing position during cross-exam, weakening it is not possible. The protocol enforces good-faith engagement by controlling what each agent reads, not by asking it to behave well.

A consequence worth preserving: arbiter steelmans are often sharper than the originals because the arbiter states each position at maximum force rather than matching the original author's word count.

Most multi-agent systems are sequential assembly lines: coordinator → worker → reviewer, with context accumulating across agents. That is task parallelism. This protocol is epistemic parallelism: N agents with disjoint scoped contexts form independent priors on a topic, then structured tension-resolution surfaces the most defensible synthesis.

── DEBATE.md

Architectural decisions worth naming

2026-04-16
Steelman enforcement is structural, not prompted
◇ shipped
Cross-exam prompts give the opposing agent only the arbiter's steelman of the opposing position — never the raw text. Strawmanning becomes architecturally impossible: weakening a position requires reading it, and the protocol does not let you read it. The discipline is enforced by what each agent gets to see, not by asking it to play fair.
2026-04-16
Verbatim synthesis assembly in Python, not arbiter
◇ shipped
Arbiter emits a DebateSynthesis JSON with the interpretive layer only. The Python renderer (debate_rendering.render_synthesis_markdown) assembles synthesis.md from the brief + all round records + analyses + synthesis JSON. Verbatim round records come from workspace artifacts, not from arbiter output. Nothing is lost to arbiter summarization — the arbiter's expensive opus/max calls are spent only on the interpretive layer.
2026-04-16
Per-agent per-round tiers
◇ shipped
The framer emits tiers keyed by round_type per panel member. Opening can use sonnet/high for cheaper coverage; cross-exam and final benefit from opus/high. The framer can override per agent — for example, an architect-perspective agent on a spec-level claim opens with opus/high because their lens is load-bearing on the opening framing. Compute budget is shaped by the question, not by a global setting.
2026-04-16
Framer picks the panel; caller never does
◇ shipped
Keeps caller bias out of debates. The framer composes from (a) ephemeral archetypes tailored to the topic and (b) topic-relevant entries from the caller's AgentRegistry. Two mix invariants are enforced post-framing: ≥1 registered profile when the registry has topic-relevant entries; ≥1 ephemeral outsider when any registered profile is on the panel. Pure-registered or pure-ephemeral panels are both possible; mixed panels are the default.
2026-05-08
Per-spawn participant scope via PanelSpawnConfigFactory
◇ shipped
Each panel spawn gets its own MCP config file pointing at the debate relay binary, with NEXUS_SESSION_TOKEN injected and a per-spawn Participant registered with the shared session carrying phase-scoped allowed_mcp_tools. Driver methods call factory.cleanup_all() in a finally block so the session doesn't accumulate stale participants across rounds. When the arbiter sets grant_engram=true for a subpass, the factory appends a read-only engram McpServerSpec to the participant's scope and the claude-CLI allowlist.
2026-05-09
Schema enforcement via inline --json-schema
◇ shipped
--json-schema expects the schema as a JSON string inline, not a path. build_claude_command reads json_schema_path and passes the content. Structured output lands in the final result event's structured_output field, not in the result string — parse_result_from_ndjson_file prefers structured_output when present and JSON-serializes it for the downstream parse layer. Discovered during real-CLI smoke validation; both fixes shipped.

The first real debate

debate_1776363989806_f041 — "Is rev-nexus's ENGRAM 6-signal scoring the right architecture for multi-agent memory, or should we move toward weighted-graph ranking?"

Triggered by an external reviewer's pushback that the framework's author routed back to the framework rather than answering directly. Panel: an architect-lens agent and a cartographer-lens agent on opus/high opening, plus an engineer and a welder on sonnet/high; all opus/max for final. Attached materials: the critic's message, the Engram CLAUDE.md scoring section, engram/scoring.py, engram/config.py, the v2 design doc.

4panelists

2cross-exam subpasses

27round records

75%revised from opening12 of 16

23m 58swall clock

991lines of synthesis

Outcome: 2 converged claims (refined to "architecture vs weights separability"; adaptive dampening unanimous), 2 contested (3-way and 4-way splits on augmentation form and the role of representation in discipline). Arbiter recommended a sequenced empirical path: ship the engineer's 10-line tier-1 validation experiment, ship the post-retrieval expansion with the cartographer's structural-centrality instrumentation, adopt a substrate-agnostic parameter-budget regime for the contested representation question.

A parallel empirical track on Engram — running Opus 4.7 in max mode under a looping benchmark harness — reached the same architectural conclusion the panel did. The panel got there by reasoning; the benchmark loop got there by measuring. Convergence from two independent tracks was the first external validation of the protocol's output.

That synthesis is what informed the decision to extract the orchestration primitives out of the legacy platform and rebuild as a small composable set of MCPs. The framework, in a real sense, argued itself into its current shape.

What's still missing

A short list of v1 limitations the design knows about:

No frontend UI. Real-time observation is tail -f debate.log (NDJSON event stream) or polling GET /debates/{id}/log. The MCP variant ships in coordinator-mode only; richer UI is a v2 problem.
No role-swap cross-exam. The arbiter in the first debate explicitly suggested routing an agent to defend (not just respond to) the opposing steelman as a falsification test for role bias. Worth adding.
Fixed 2-subpass cap on cross-exam. Genuinely deep tensions might benefit from more, or from a fully dynamic cap informed by revision rate. The dynamic-subpass policy is partly wired (subpass_policy.compute_revision_rate + compute_tension_density feed the arbiter's check_close decision); the hard cap remains.
revised_from_opening is agent self-report. Some "held" responses contain substantial revision in their text. Useful signal, not authoritative.
Sonnet under --json-schema is currently required for framing. Haiku runs 30+ internal turns under schema enforcement before emitting valid structured output — cost-prohibitive in production. Schema-hint-in-prompt + lenient parse via parse_agent_response is the planned escape hatch.

// section 04 :: spawn4 / 6

Spawn

Tiered multi-turn task coordination with MCP-layer scope enforcement, custom tier registration, and a Cartographer that keeps the project framing artifact warm.

// session :: nexus.spawntransmit

What it does

Run a multi-turn task on a real codebase, with a structured handoff between phases, and structural enforcement of who can touch what. The caller (Claude Code, typically) holds the conversation; nexus-spawn-mcp runs the orchestration, dispatches Execution-tier turns, runs a review pass, and reports back. The caller never delegates discovery or conversation to the orchestrator — Advisory is the gatekeeper and pre-digester.

A typical task lifecycle:

Caller invokes task_create(title, prompt) and gets back a task_id.
Caller invokes task_orchestrate(task_id). Coordinator spawns an Operations-orchestrate agent that produces a ProblemBrief — problem statement, scope (read paths, write paths, do-not-touch), tier plan, success criteria, max review retries.
Caller invokes task_run_turn(task_id) repeatedly. Each call dispatches the current-tier-phase agent, runs handoff parsing, calls the broker to evaluate the transition, optionally runs a review, optionally retries the execution turn within the cap, and advances the tier plan pointer.
The state machine reaches a terminal state (complete, blocked, escalated, cancelled). Caller reads task_result(task_id) and surfaces it to the user.

Tiers

Four tiers, three roles:

Advisory: The caller itself. Holds the conversation with the user, decides which tasks to spawn, reads results back, and steers between turns. Nothing in nexus-spawn-mcp spawns Advisory — it's whoever holds the MCP connection.
Operations: Phase-aware tier with two built-in phases. orchestrate produces the problem brief (sonnet/high, --json-schema=ProblemBrief, no project filesystem access). review validates execution output (sonnet/medium, validator role, no write tools). Brief is the input contract for everything downstream; review is the quality gate.
Execution: Does the work. Single phase: implement. Scoped read/write/edit against brief.scope. Default ENGRAM access is read-only via make_readonly_engram. The path validator returns {status: "out_of_scope", hint: ...} for violations rather than raising — agents iterate instead of parse-fail ping-pong.
Field: Reserved for the right-click-fixer pattern from the legacy platform (see legacy). Currently deferred — needs a CONDUIT-shaped runtime hook that the MCP layer alone can't supply.

Tiers are first-class. Custom tiers register via the register_tier(tier_spec) MCP tool — the spec carries phases, model defaults, allowlists, scope overlays, and turn budget. Built-in names (operations, execution) and reserved names (advisory, field) are rejected.

Scope is enforced, not requested

The brief carries a scope object: read_paths, write_paths, do_not_touch. After post-brief invariant checks (starting_tier ∈ tier_plan, write_paths ⊆ read_paths, do_not_touch ∩ write_paths = ∅, ≥1 non-operations entry), the brief is locked.

Per-turn, the spawn config factory resolves the participant's effective scope: brief.scope ∪ tier.scope_overlay ∪ turn_override − do_not_touch. The execution agent's read_scoped_file, write_scoped_file, and edit_scoped_file tools wrap the path validator. A 256 KB content cap is applied per read and per write.

Two consequences:

Plain prose like "do not touch the auth module" is never written into prompts. The auth module is not in the participant's scope. The write tool refuses the path. The agent gets a structured hint and changes course.
Crash recovery is trivial. The full per-turn config and the participant's scope live on disk in _mcp_<participant_id>.json. Re-running the same task_run_turn against a half-finished workspace picks up exactly where the previous attempt left off.

The implicit driver walk

task_run_turn doesn't run one model call. It walks:

Spawn current-tier-phase agent. Wait for terminate.
Parse the handoff: authoritative submit_handoff first → JSON fallback → haiku follow-up → synthesize_blocked_handoff if all three fail.
Call broker.evaluate_handoff to decide the transition: advance the tier plan, retry execution within the cap, escalate, complete.
If review is configured for the current tier-phase, dispatch a review spawn (Operations-review, sonnet/medium, validator role). Review allowlist excludes write tools by construction. Review writes a ReviewReport with recommendation: advance | retry | escalate, an alignment score, and an optional retry hint.
If the review recommends retry, re-spawn the same execution phase with the hint baked into the turn override. brief.max_review_retries=2 default; status.review_retries_used tracks within a cycle and resets on plan-pointer advance. Retry-cap exceeded → synthesize an escalate handoff.
Return when the tier-plan pointer advances, a terminal state is reached, or the retry cap is hit.

One MCP call from the caller, several sub-spawns inside. The caller never has to know about review-retry mechanics — they happen below the API.

Engram authority, in flight

Engram's scoring is provenance-aware: drawers are ranked by a signal mix that includes an authority multiplier from a tier-keyed AuthorityProfile. nexus-spawn-mcp builds one for the standard tier ordering — user > advisory > operations > execution > auto — and hands it to spawned engram MCPs by writing AuthorityProfile.to_dict() to a temp JSON file and setting ENGRAM_PROFILE_PATH in the spawn's env.

Engram's own constructor enforces the invariants — tier references, positive boosts, default_tier presence — so a malformed profile fails at construction, not at subprocess startup. Custom tier registrations don't get a free pass: register_tier injects the new tier into the authority profile too, so the engram-side scoring stays aligned with whatever tiers the caller registered.

For Execution-tier spawns, make_readonly_engram applies two-layer defense: ENGRAM_READ_ONLY=1 in the env and the participant's allowed_mcp_tools enumerates only read-only tool names. Even if a future engram release exposes additional write tools, they remain unreachable.

The Cartographer

Not a tier — infrastructure. A post-task hook that maintains a ~2KB framing artifact at <project_root>/.rev-nexus/context-map.md. The map is what the Orchestrator reads at task start to understand what the project is. Not an Engram substitute — Engram is for evidence (decisions, risks, patterns, scored by provenance); the context map is for framing (project shape, always-applicable constraints, architectural patterns).

2026-05-08
Cartographer is infrastructure, not a tier
Not in tier_plan. Not callable by Orchestrator or Execution. Not registerable as an agent profile. One job: keep the ~2KB context map at <project_root>/.rev-nexus/context-map.md warm. No source edits, no engram writes, no cross-MCP spawns. This is a deliberate role discipline — the temptation to let Cartographer do drift detection, propose tasks, or write to engram was rejected because it would violate the framing-vs-evidence separation.
2026-05-08
Three triggers, two roles
Incremental (haiku/medium, fire-and-forget post-completion) fires from SpawnApp.run_turn when a task reaches terminal complete and patches the map if anything structurally noteworthy changed. Seed (sonnet/low, async from task_create) fires when the project has no map yet — non-blocking, so task 1 runs without the map and task 2+ sees it. Explicit rescan deferred to v2.
2026-05-08
Drift detection is incidental, not enumerative
The Orchestrator reads the map as framing, not ground truth. If it happens to notice the map contradicts what it sees while building the brief, it emits an entry in ProblemBrief.framing_flags. The task_orchestrate response passes these through as flags: list[str] so the caller can relay them to the user. The Orchestrator does NOT enumerate the codebase looking for drift — that defeats the purpose of lightweight framing. Only incidental observations surface.

Tool surface (excerpt)

The orchestrator-facing surface (visible to Advisory, i.e. the caller):

task_create · task_orchestrate · task_run_turn
task_status · task_log · task_result · task_cancel · task_reopen
register_agent · list_agents · unregister_agent
register_tier · unregister_tier · list_tiers
list_tasks_mcp

The agent-facing surface (visible to Operations + Execution spawns, scoped per phase):

read_scoped_file · write_scoped_file · edit_scoped_file
list_scope · append_scratch · read_scratch
read_my_brief · read_my_tier · read_my_scope · read_my_prompt
read_turns_remaining · read_task_status · read_prior_handoffs · read_scratch_path
query_engram · search_engram · record_decision · check_duplicate_decision
record_retrieval · list_wings · describe_authority_profile
read_diff (review-phase only) · read_context_map (orchestrate-only)
submit_brief · submit_clarification · submit_handoff · submit_review

Each phase's allowlist is a subset. Operations-orchestrate doesn't get file I/O — the orchestrator reasons from the prompt + the framing map + Engram-surfaced decisions, never the raw filesystem. Operations-review gets read_diff and read-only file access, no writes by construction. Execution gets the full scoped I/O surface plus the read-only engram subset.

The discipline is enforced by the allowlist, not by the prompts.

// section 05 :: legacy5 / 6

Legacy

The original monolithic platform — Director-led pipeline, named agent crew, file-handoff workspaces, the in-app Conduit fixers — and what got carried forward.

// session :: nexus.legacytransmit

What the original was

The first nexus was a developer tool: a FastAPI backend, an Electron frontend, a SQLite cache, and a per-project .rev-nexus/ directory full of plans, specs, and workspace artifacts. It managed a roster of named agents — The_Director (the human), The_Architect, The_Orchestrator, The_Foreman, The_Cartographer, The_Engineer, The_Fabricator, The_Oracle, Clawd, plus a 19-name pool of cluster workers and the right-click conduit agents (The_Mechanic, The_Designer) and, later, The_Arbiter for debates.

A task flowed through a structured pipeline:

Director creates task
   ↓
Orchestrator kickoff (problem brief, scope, mode, tier)
   ↓
Execute turn (Claude Code CLI subprocess, tool profile per agent)
   ↓
Director approves / revises / reopens
   ↓
... repeat until complete / blocked / escalated
   ↓
Foreman review (autonomous-mode safety gate)
   ↓
Engram mining + context-map update

The platform earned its complexity. It ran solo / dual / swarm execution modes, autonomous mode with a 3-cycle dispatch budget, per-task execution tiers (light → max), nine task types each with their own turn budgets and execution guidance, deterministic SHA-256-seeded cluster naming for crash-safe parallel work, and progressive context summarization for long tasks. The Architect was a two-phase advisory loop (spec then design) that produced a project's .rev-nexus/spec.md, phases.md, claude.md, and design.md for execution agents to read. The Cartographer ran a dependency-graph analysis pass and a gap-analysis conversation. Engram was the memory layer through every step.

It worked. It also got too big to safely change — every new feature touched four parts of the platform at once.

Conduit — the in-app right-click

Conduit is the piece I'm proudest of from the legacy platform. It's a small JavaScript bundle (@rev-nexus/conduit) that ships into a project's running web UI as a script tag. Once initialized, every right-click on an element opens a context menu that calls one of two field agents:

The_Mechanic — functional issue fixer. Captures DOM context (tag, classes, ID, text, route, component path), generates a context-aware opening line ("That Save button on your project form — is it not firing, or is it hitting the API and dying there?"), runs a one-or-two-exchange conversation, and dispatches a tightly-scoped bugfix task into the orchestration pipeline. Personality: direct, slightly cocky, casually confident. Not a consultant.
The_Designer — visual issue fixer. Same flow, focused on spacing, typography, color, alignment, design-system consistency. Gets design.md so it can reference specific tokens. Personality: observant, thoughtful, detail-oriented.

The flow that mattered:

developer right-clicks an element
   ↓
CONDUIT captures DOM + route + diagnostics (console errors, failed fetches)
   ↓
The_Mechanic / The_Designer renders an opening line
   ↓
1–2 exchange conversation (intake-only — no tools, no MCP)
   ↓
agent emits {"dispatch": true, "prompt": "..."}
   ↓
task auto-creates with autonomous=true, tier=light, type=bugfix or design
   ↓
fire-and-forget orchestrate-then-execute

The intake agents themselves didn't write code — they were translators. They took "this button looks weird" and produced a structured task description that the orchestration pipeline could act on. Director's verbatim words got concatenated onto the dispatched prompt so the execution agent saw the original report, not just the intake agent's interpretation.

The thing that worked about Conduit was the activation energy. Without it, reporting "this button's margin is wrong" required: open task dashboard, write description, wait for orchestration, wait for execution. With Conduit, you right-click the button, say what's wrong in plain language, and a scoped fix task auto-dispatches through the full pipeline. The intake agent runs on sonnet/low. The execution turn runs on sonnet/medium. The whole cycle from right-click to completed PR was usually under five minutes.

Conduit doesn't exist in the new MCP world yet — it needs a runtime hook (a way for the agent to be reachable from inside a running web app) that the MCP layer alone can't supply. The Field tier is reserved for it. That's the next time it ships.

The agent crew

The original roster was deliberately named. Each agent had a documented role, a tool profile, and a system prompt that established its lens. The naming wasn't whimsy — when the Orchestrator routed a task, the routing decision was "which lens does this need", and the names made the lens choice legible.

The split that mattered:

Advisory layer. The_Architect (spec + design + project conventions, two-phase greenfield/bootstrap), The_Cartographer (structural analyst with read-only access), The_Oracle (cross-project memory, second-in-command). Read-write to Engram, write to project documents, read-only to the codebase. They wrote planning artifacts; they never touched implementation.
Operations layer. The_Orchestrator (task planner, problem-brief author, scope-setter — never wrote code, read-only access to keep the lane clean) and The_Foreman (integration reviewer + autonomous-mode safety gate — read-only during review, full toolset during dispatch).
Execution layer. The_Engineer (backend specialist), The_Fabricator (frontend specialist following The_Architect's design system), Clawd (full-stack generalist, default agent in solo mode), and the cluster pool (parallel workers in swarm mode, no Agent tool because cluster workers are leaf nodes).
Field layer. The_Mechanic, The_Designer, and the right-click pipeline.
Special. The_Arbiter (debate moderator, opus/max, read-only to engram, written into the roster for the debate framework).

The roles were enforced by tool profiles in agent_tooling.py. Each profile listed its allowed builtin tools (Read/Edit/Write/Bash/Glob/Grep/Agent) and MCP servers (Engram with rw / ro, plus declared-but-not-yet-wired Sentry / GitHub / Playwright / Sequential-thinking servers). The roster was the lens; the profile was the boundary.

File-based handoffs

Workspaces lived at ~/.rev-nexus/workspaces/{task_id}/. Every artifact was a JSON file. The DB was a cache; the workspace was the truth. Crash recovery was "if the PID is alive, monitor; if it's dead, recover from the stdout file".

~/.rev-nexus/workspaces/{task_id}/
   broker_t0_init.json            # Orchestrator kickoff (immutable)
   broker_status.json             # Source of truth — status, turn, owner, history
   {agent}_t{N}_handoff.json      # Per-turn handoff
   pending_handoff.json           # Awaiting Director approval
   cluster_plan_t{N}.json         # Swarm plan
   cluster_slot_{N}_result.json   # Per-slot result
   orchestration_log.json         # Debug trace

Every turn ended with a structured handoff: complete / handoff / checkpoint / blocked / escalate. Checkpoints carried a checkpoint_context field — the only continuity mechanism, because each Claude Code CLI invocation was a fresh stateless process. The Director feedback loop injected an ## Director Feedback block into the next prompt; revise mode reran the same agent without advancing the turn counter.

Self-verification was wired into the prompt — agents required a verification field with per-criterion pass/fail evidence before handing off as complete. Agents that couldn't fully verify were told to use checkpoint instead, preventing premature completion at the source.

That whole pipeline carried forward in spirit. The new spawn-mcp's submit_handoff plays the same role; the broker still runs the transition logic; workspaces are still atomic-tmp-rename-on-disk truth. What got dropped: the Director-approval gate (replaced by the caller — Claude Code is the new approval surface), the autonomous-executor loop (replaced by the implicit driver walk inside task_run_turn), the Foreman dispatch cycles (replaced by Operations-review + retry-budget).

Why it collapsed

Three forces pulled at once:

The platform shape was load-bearing on Director-as-human. Half the system existed to manage the approval loop, feedback injection, revise/reopen flows, and the autonomous-mode safety gate around it. When Claude Code matured into a credible Advisory agent in its own right, the human-as-Director assumption stopped being a forcing constraint and started being a structural cost.
The roster was great as an idea, expensive as a runtime. Twelve named agents with twelve system prompts, twelve tool profiles, twelve sets of routing logic, four tiers of Engram access, a name pool, deterministic seeding, and a team_roster.py that had to stay synchronized across agent_names.py, agent_tooling.py, and the React frontend. Adding a new agent touched five files. Removing one was worse. The cost-of-change curve was bending the wrong way.
The debate framework argued the platform down. The first real debate about Engram's scoring architecture worked. The framework respected its own output enough to ship the recommendation. When the same framework was used to argue about its own future shape — keep the platform, or extract the orchestration primitives — the platform-side arguments were strong in isolation but did not survive steelman cross-exam. The MCP-side won on portability, audit, and the smaller blast radius of changes.

The collapse was deliberate. What it kept: spawn_agent as a Windows-safe atom; the tier model with scope discipline; the framer-picks-the-panel debate protocol with structural steelmans; the Cartographer-as-framing-keeper post-task hook. What it dropped: Director-approval pipeline, autonomous-executor loop, Foreman safety gate, in-app Conduit (Field tier, deferred), cluster-pool deterministic naming, per-project SQLite cache, the entire FastAPI backend and Electron frontend.

What I'd do differently

If I were starting from scratch today, knowing what the new shape looks like:

Build the atom first. spawn_agent was the last thing extracted; it should have been the first thing committed. Everything else is layered logic on top of it.
Workspaces before APIs. The file-on-disk truth pattern was right from day one in the legacy platform. The DB-as-cache pattern was right too. Both should have been written down as constraints before any HTTP endpoints.
MCP as the agent boundary, not the platform's external API. The legacy platform had a 50-route FastAPI surface on top of a 50-module service layer. The new shape has stdio MCP and that's the whole boundary. The MCP layer is where scope is enforced; everything else is implementation detail.
Write the debate protocol earlier. It paid for itself the first time it ran. Every architectural decision that followed went through it.

// section 06 :: decisions6 / 6

Decisions

Architectural pivots from the original platform to the current MCP shape, in order.

// session :: nexus.decisionstransmit

2026-05-09
Cartographer is infrastructure, not a tier
◇ shipped
The post-task framing-map maintainer is intentionally outside the tier model. Not in tier_plan, not callable by Orchestrator or Execution, not registerable as an agent profile. One job: keep <project_root>/.rev-nexus/context-map.md warm. No source edits, no engram writes, no cross-MCP spawns. Drift detection happens incidentally via ProblemBrief.framing_flags when the Orchestrator notices contradictions while building a brief — it does not enumerate the codebase, which would defeat the purpose of lightweight framing.
2026-05-08
Phase 3b — Operations becomes phase-aware
◇ shipped
Tier count stays at 4 (Advisory, Operations, Execution, Field). Review is not a separate tier — it's a phase of Operations. TierSpec.phases holds a PhaseSpec dict, and OPERATIONS ships with orchestrate (the brief-build phase) and review (the validator phase). Engram authority falls out naturally as "operations" for both. The implicit driver walk inside task_run_turn walks execution → optional review → bounded retries; one MCP call from the caller hides the retry mechanics.
2026-05-07
Custom tier registration as a first-class tool
◇ shipped
register_tier(tier_spec) on the orchestrator-facing surface, with built-in-name and reserved-name rejection. Custom tiers carry their own phases, model defaults, allowlists, scope overlays, and turn budgets — and they get folded into the AuthorityProfile handed to engram so retrieval scoring stays aligned. The framework doesn't pretend to know what tiers a downstream platform will need.
2026-05-05
Phase 3a — tier-keyed authority profile flowing into engram
◇ shipped
The standard tier ordering (user > advisory > operations > execution > auto) is codified in nexus_spawn_mcp.authority as a real engram.profiles.AuthorityProfile. The profile is handed to spawned engram MCPs via ENGRAM_PROFILE_PATH — serialized through to_dict() to a temp JSON file, env var pointed at it. Engram's own constructor enforces invariants (tier references, positive boosts, default_tier presence), so a malformed profile fails at construction, not at subprocess startup.
2026-05-03
Read-only engram via two-layer defense
◇ shipped
Execution-tier spawns get make_readonly_engram: ENGRAM_READ_ONLY=1 in the env and the participant's allowed_mcp_tools enumerates only read-only tool names. Even if a future engram release exposes additional write tools, they remain unreachable. Belt-and-suspenders, deliberately — env-only would trust the upstream MCP to honor the flag; allowlist-only would trust the upstream to not add write tools later. Both is cheap.
2026-05-01
Phase 2.1 — per-spawn participant scope via factory pattern
◇ shipped
Each panel spawn (and now each task turn) gets its own MCP config file pointing at the relay binary, with NEXUS_SESSION_TOKEN injected, a per-spawn Participant registered with the shared session, and phase-scoped allowed_mcp_tools. Driver methods call factory.cleanup_all() in a finally block so the session doesn't accumulate stale participants across rounds. The pattern started in nexus-debate-mcp's PanelSpawnConfigFactory and got reused as TaskSpawnConfigFactory for spawn-mcp; both do the same six-step assembly.
2026-04-30
Real-CLI smoke validation finds two schema fixes
◇ shipped
Phase 2.1's assumptions verified against claude CLI v2.1.x. Two non-obvious fixes shipped: --json-schema expects the schema inline as a JSON string, not a file path — build_claude_command now reads the path and passes the content. And structured output lands in the final result event's structured_output field, not in the result string — parse_result_from_ndjson_file prefers structured_output when present and JSON-serializes it for the downstream parse layer. Also: --max-turns is not a real CLI flag; the parameter is silently dropped, kept on the API for compatibility.
2026-04-28
Phase 2 — coordinator/relay process model
◇ shipped
One long-lived coordinator serves the caller via stdio (FastMCP); per-spawn relays connect over TCP loopback and forward MCP JSON-RPC. Each spawn carries NEXUS_SESSION_TOKEN. Session.authorize gates every call. tools/list is filtered per-connection. The discipline that lets a panelist see only its phase's tools is enforced by the relay-coordinator boundary, not by the spawn's prompt. Marked for promotion to nexus-core when the second consumer (spawn-mcp) settled — which it has, in Phase 3a.
2026-04-25
Windows-safe spawn primitives — files over pipes, threaded stdin pump
◇ shipped
Subprocesses must use file-based stdout/stderr capture, not pipes — pipes deadlock under asyncio + many other event loops on Windows. The pattern is in nexus_core.spawn._run_blocking. The relay's stdin reader uses a threaded pump on Windows because asyncio.connect_read_pipe(sys.stdin) is unreliable on ProactorEventLoop — without it, the claude CLI's initialize request hangs and the MCP server stays in pending forever. These two patterns were learned the hard way in the legacy platform and got carried forward as hard constraints.
2026-04-22
`spawn_agent` is the atom — keep it pure
◇ shipped
Pure, async at the public boundary, no retry, no multi-turn coordination, no NEXUS-specific assumptions. It invokes claude -p once and returns the result. Higher-layer logic — debate phases, task turns, retry budgets, review walks — is implemented in the MCPs that use it. The blocking core runs inside asyncio.to_thread(), the public API stays uniform. Not exposing a synchronous helper is a deliberate API ergonomics call.
2026-04-20
The collapse: monolith → three composable MCPs
◇ shipped
The legacy platform — FastAPI backend, Electron frontend, SQLite cache, ~/.rev-nexus/ directory, autonomous executor, Foreman safety gate, Director-approval pipeline, in-app Conduit fixers, twelve-agent roster — got extracted into three Python packages: nexus-core (library), nexus-debate-mcp (debate protocol), nexus-spawn-mcp (tiered tasks). What carried forward: spawn_agent, the tier model, scope discipline, the debate protocol, the Cartographer pattern. What got dropped: Director-approval pipeline (replaced by the caller), autonomous-executor loop (replaced by the implicit driver walk), Foreman safety gate (replaced by Operations-review), in-app Conduit (Field tier, deferred), cluster-pool deterministic naming, per-project SQLite cache.
2026-04-16
The first real debate validates the protocol
◇ shipped
debate_1776363989806_f041 — "Is the 6-signal scoring the right architecture for multi-agent memory, or should we move toward weighted-graph ranking?" Triggered by an external reviewer's pushback. 4 panelists, 2 cross-exam subpasses, 27 round records, 12 of 16 cross-exam responses revised from opening (75%), 23m 58s wall clock, 991 lines of synthesis. A parallel benchmark loop on Engram converged with the panel's recommendation from the opposite direction (panel reasoned; benchmark measured). The two-track convergence was the first external validation. That synthesis is what informed the decision to extract the orchestration primitives out of the legacy platform.
2026-04-16
Steelman enforcement is structural, not prompted
◇ shipped
Cross-exam prompts give the opposing agent only the arbiter's steelman of the opposing position — never the raw text. Strawmanning becomes architecturally impossible: weakening a position requires reading it, and the protocol does not let you read it. A consequence worth keeping: arbiter steelmans are often sharper than the originals because the arbiter writes each side at maximum force without matching the original author's word count. The discipline is enforced by what each agent gets to see, not by asking it to play fair.
2026-04-16
Verbatim synthesis assembly in Python, not arbiter
◇ shipped
Arbiter emits a DebateSynthesis JSON with the interpretive layer only. The Python renderer assembles synthesis.md from the brief + all round records + analyses + synthesis JSON. Verbatim round records come from workspace artifacts, not from arbiter output — nothing is lost to summarization. The arbiter's expensive opus/max calls are spent only on the interpretive layer.
2026-04-14
The framer picks the panel; the caller never does
◇ shipped
Keeps caller bias out of debates. The framer composes from (a) ephemeral archetypes tailored to the topic and (b) topic-relevant entries from the caller's AgentRegistry. Two mix invariants enforced post-framing: ≥1 registered profile when the registry has topic-relevant entries; ≥1 ephemeral outsider when any registered profile is on the panel. The earliest debate sketch let the caller specify panelists directly; the result was a panel that confirmed the caller's framing. Removing that lever was load-bearing.
2026-04-10
`McpServerSpec` is self-describing — no central registry
◇ shipped
Specs carry their own command, args, env. The legacy mcp_config.py kept a hardcoded _ACTIVE_SERVERS set that named engram, playwright, context7, and sequential-thinking by convention — adding a server to a profile without adding it to that set was inert, and the inertness was itself a bug. The new shape has no such registry. Each spawn's caller specifies everything in the spec.
2026-04-08
Workspaces are sole truth — no DB
◇ shipped
Every state machine persists to Workspace(type='task') or Workspace(type='debate') directories with atomic tmp-rename writes. parent_workspace_id and linked_workspace_ids make cross-MCP synergy traceable without a central registry. The legacy platform had a SQLite cache that was technically derived state but ended up being treated as authoritative in places — the new shape doesn't have one to be tempted by.
2026-04-05
Engram as native dependency, MCP as the runtime contract
◇ shipped
Both nexus-core and the two MCPs declare engram >= 0.6.0. The workspace pyproject.toml pins it to the sibling ../rev-engram/ checkout via [tool.uv.sources] with editable = true, so local edits to engram propagate into every nexus package without a reinstall. The runtime integration remains MCP — each spawned engram runs as a subprocess over stdio. Code-level dependency for shared types (engram.profiles.AuthorityProfile); MCP at runtime for actual queries.
2026-03-28
Scoped context as architecture, not prose
◇ shipped
The thesis the new shape defends. A spawned panelist sees only the tools its phase grants. An execution-tier worker can't write outside the brief's scope because the path validator runs in the MCP layer, not the prompt. "Don't touch X" is never written; it's structurally impossible. This claim was implicit in the legacy agent_tooling.py profiles — the new MCP shape makes it the central architectural commitment.
2026-03-12
The_Arbiter joins the legacy roster, debate-only
◇ shipped
Read, Glob, Grep + read-only Engram. No write access — must stay neutral, must not leave opinions in memory during a debate. Activation only during debates; not visible in task-mode workflows. Default tier: claude-opus-4-7 / max. The Arbiter exists because moderation is the synthesis-heavy workload where the strongest model earns its cost — and because giving any panelist the moderator role would corrupt the steelman discipline by giving someone with a stake the keys to the framing.
2026-03-10
Director profile calibrates advisory communication
◇ shipped
Settings table keys: director_technical_level (beginner/intermediate/technical/engineer), director_communication (terse/balanced/detailed), director_background (free text). Read by director_profile.py, injected into advisory agent prompts. Not cosmetic — when the Director sets technical level to "engineer", advisory agents skip introductory explanations and focus on edge cases and trade-offs. Carried forward in spirit: the new caller (Claude Code) has its own profile and its own user-shaped communication; the platform's specific calibration knobs got dropped along with the Director-approval pipeline.
2026-02-20
The Conduit pattern — right-click → field agent → autonomous task
◇ shipped
@rev-nexus/conduit ships into a project's web UI as a script tag. Right-click any element opens a context menu calling The_Mechanic (functional) or The_Designer (visual). Context-aware opening line, 1–2 exchange conversation, dispatch a tightly-scoped autonomous task into the orchestration pipeline. The intake agents are pure conversation — no tools, no MCP, no project filesystem access. They translate "developer pointing at a problem" into "structured task the execution layer can act on". The pattern doesn't exist in the new MCP world yet — it needs a runtime hook the MCP layer alone can't supply, and the Field tier is reserved for it.
2026-02-12
Tiered context with deliberately scoped windows
◇ shipped
The original orchestration framework's central design call: agents are organized into tiers by function, and each tier receives a deliberately scoped window of context. Advisory gets full project awareness for strategic decisions. Operations gets task-level context for planning and routing. Execution gets a focused problem brief and the files they touch. Field gets barely more than the DOM element the developer pointed at. The trade-off between context depth and action speed is the design axis, and it survived the collapse — the new shape just enforces it via MCP-layer scope instead of prompt prose.

Nexus

▰▰Pitch

▰▰What's wired

▰▰Reading order

Philosophy

▰▰What's broken about the default

▰▰Scoped context, structurally

▰▰Epistemic parallelism

▰▰What I'm trying to prove

▰▰The bet behind the collapse

Architecture

▰▰The pieces

▰▰The coordinator/relay process model

▰▰How a spawn flows

▰▰Workspaces are the truth

▰▰Engram, scoped

▰▰Hard constraints

Debate

▰▰What it produces

▰▰The protocol

▰▰Steelman enforcement is structural

▰▰Architectural decisions worth naming

Steelman enforcement is structural, not prompted

Verbatim synthesis assembly in Python, not arbiter

Per-agent per-round tiers

Framer picks the panel; caller never does

Per-spawn participant scope via PanelSpawnConfigFactory

Schema enforcement via inline --json-schema

▰▰The first real debate

▰▰What's still missing

Spawn

▰▰What it does

▰▰Tiers

▰▰Scope is enforced, not requested

▰▰The implicit driver walk

▰▰Engram authority, in flight

▰▰The Cartographer

Cartographer is infrastructure, not a tier

Three triggers, two roles

Drift detection is incidental, not enumerative

▰▰Tool surface (excerpt)

Legacy

▰▰What the original was

▰▰Conduit — the in-app right-click

▰▰The agent crew

▰▰File-based handoffs

▰▰Why it collapsed

▰▰What I'd do differently

Decisions

Cartographer is infrastructure, not a tier

Phase 3b — Operations becomes phase-aware

Custom tier registration as a first-class tool

Phase 3a — tier-keyed authority profile flowing into engram

Read-only engram via two-layer defense

Phase 2.1 — per-spawn participant scope via factory pattern

Real-CLI smoke validation finds two schema fixes

Phase 2 — coordinator/relay process model

Windows-safe spawn primitives — files over pipes, threaded stdin pump

`spawn_agent` is the atom — keep it pure

The collapse: monolith → three composable MCPs

The first real debate validates the protocol