Managed Agents Spike — Codebase Research, CMA vs Messages API (3 prompts)

2026-05-09 correction: the latency headline ("2.7× faster") in this doc and the "Production-prompt benchmark" section below was based on a single open-ended prompt that ran 483 s on Messages-API. Real production telemetry is 19 s median / 28 s p95 (n=36 over 30 days). Today's CMA smokes are 4–7× slower than the production p95, not faster. The cost case still holds; the latency case is reversed. See 2026-05-09-production-telemetry-correction.md for the full postmortem and methodology lessons. The body of this doc is preserved unchanged for historical reference — apply the postmortem's findings before citing any latency number from below.

Summary

Empirical spike: ran 3 representative kendo "research a codebase" prompts through two implementations on the same model (Claude Sonnet 4.6), with equivalent capabilities. Anthropic Managed Agents (CMA) was 19× cheaper, ~18% faster on average, and used 58% fewer tool calls than a custom Messages-API implementation.

Path	Total wall time	Total cost	Avg tool calls / prompt
Messages API (custom tools, ripgrep over a local kendo clone)	517s	$9.72	64
CMA (`agent_toolset_20260401` on `github_repository` mount)	422s	$0.51	27

The cost gap is dominated by automatic prompt caching in CMA sessions. My Messages-API implementation didn't set cache_control markers — a fair-fight version would narrow the gap to maybe 3-5×, not 19×. But the gap stays meaningful in any version of the comparison, and it explains a structural property of CMA: long-conversation workloads with many tool calls (the shape of any agent loop) get prompt caching for free, where the equivalent Messages-API code has to opt in deliberately.

Output quality was indistinguishable between the two paths. Both produced clean structured markdown reports with file:line references. Variance in which exact line they cited (e.g. AppServiceProvider.php:165 vs :167) suggests minor hallucination on either side, with no consistent winner.

The spike validates a single specific claim: for codebase research as a workload, CMA is the right tool. It doesn't validate CMA for the parts of an autonomous-PR workflow that involve writing code, running tests, or opening PRs — those are still untested.

Spike setup

Standalone scaffolding at ~/Code/cma-spike/ — Python 3.12, anthropic SDK 0.100.0, python-dotenv. Both paths take the same prompt files; both use Sonnet 4.6; both have MAX_TOKENS_PER_RUN=4096 output cap.

Messages API path (messages_api/run.py): direct client.messages.create with three custom tools — search_code (shells to grep/ripgrep), read_file, list_files — pointed at a local kendo clone. Loop continues until the model emits stop_reason != "tool_use".

CMA path (cma/run.py): creates a Session against a pre-created Agent + Environment, attaches kendo via github_repository resource (read-only fine-grained PAT). Uses the built-in agent_toolset_20260401 (bash, read, write, edit, glob, grep). Streams events; stops on session.status_idle with stop_reason.type === "end_turn".

System prompts are aligned between the two paths so the comparison is on the runtime, not on the framing.

Verbatim numbers

Prompt	MA wall	CMA wall	MA tokens	CMA tokens	MA $	CMA $	MA tools	CMA tools
`01-find-branch-linker-usages`	135.3s	100.2s	618,968 in / 5,968 out	10 in / 2,498 out / 105,007 cache reads	$1.9464	$0.0690	42	11
`03-webhook-intake-pattern`	154.4s	182.3s	880,797 in / 8,225 out	19 in / 5,535 out / 512,946 cache reads	$2.7658	$0.2370	57	32
`06-feature-flag-pattern`	227.2s	139.6s	1,618,413 in / 9,989 out	22 in / 5,685 out / 385,398 cache reads	$5.0051	$0.2010	95	37
avg	172.3s	140.7s	1,039,392 in / 8,060 out	17 in / 4,572 out	$3.24	$0.17	64	27

The CMA "input tokens" column reads as ~10-20 per run because almost everything (system prompt, tool definitions, accumulated history) hits the prompt cache. The cache-read column captures the real conversational scale on the CMA side.

The MA "input tokens" column shows the cumulative input over all turns of the agent loop — at 95 tool calls (prompt 06), the conversation history alone is 1.6M tokens of cumulative input. This is normal for an uncached agent loop; cache markers would dramatically reduce it.

Per-prompt observations

Prompt 01 — find IssueBranchLinker usages. Narrow component-tracing question. CMA: 100s, 11 tool calls. MA: 135s, 42 tool calls. CMA's tool-call efficiency comes from one-shot bash compositions (find ... | xargs grep ...) replacing what MA does as 5-7 sequential round-trips.

Prompt 03 — webhook intake pattern. Multi-file pattern documentation across routes, middleware, controllers, jobs, audit. The only prompt where CMA was slower (182s vs 154s). The CMA agent went deeper — read 32 tool calls' worth of files including the audit logger and the queued job retry config. The MA agent stopped earlier with a thinner answer. So CMA's "loss" on wall time was paired with a richer report. Both produced usable output.

Prompt 06 — Pennant feature flag pattern. Small enumeration with auto-discovery wiring. CMA: 140s, 37 tool calls. MA: 227s, 95 tool calls. The MA agent looped extensively — five separate searches for the #[Name] attribute, repeated reads of AppServiceProvider. The cumulative input hit 1.6M tokens (the full session conversation history flowing through every turn) and the cost ballooned to $5. CMA capped the same exploration with prompt caching. This is the prompt where the cost gap got most extreme (25×).

Findings

1. Cost: CMA wins decisively, ~19× cheaper across the sample. Caveat: this includes the unfair-caching effect. A Messages-API implementation that wires up cache_control: {type: "ephemeral"} on system prompts and tool definitions would narrow the gap to maybe 3-5×. But that gap stays meaningful, and the rule generalises: CMA gets prompt caching for free; custom Messages-API code gets it only when you remember to opt in. For long-running agent loops, that's a real ergonomic win.

2. Speed: CMA wins on average (~18%), but variance is high. Prompt 03 was 28s slower on CMA. The mean tells the story; individual prompts swing both ways depending on how exploratory the agent gets. Repeated runs of the same prompt would also vary — we didn't measure that.

3. Tool-call efficiency: CMA uses ~58% fewer calls. Structural, not caching-related. In-container bash composes (find | xargs grep, cat | head, etc.) replace what tool-calling Messages API needs multiple round-trips for. This effect is independent of caching and would persist in a "fair fight" comparison.

4. Output quality: indistinguishable. Both paths produced clean structured reports with file:line refs. Minor differences in cited line numbers (off-by-2 here and there) on both sides. Neither wins on quality.

Honest limitations

Sample size of 3. Directional signal, not a tight measurement. Variance per prompt is real (prompt 03 reversed the speed result). Wider sample would tighten the numbers.
Codebase research only. This validates the read side of an agent. Writing code, running tests, opening PRs — all untested.
Sonnet 4.6 only. Opus would shift absolute numbers but probably keep the ratio.
Cold-start tax hidden. Every CMA run started cold (new session per prompt). Production might reuse sessions for warm calls; we didn't measure that.
The Messages-API path didn't use prompt caching. This was deliberate (matches how app/Actions/Agent/ResearchAction.php is wired today, which doesn't use cache_control markers either) but it's the dominant cause of the cost gap. The "real" gap is smaller.
Both paths were single-turn from the user's POV. Multi-turn iterative refinement was not tested.

Strategic read

The signal is strong enough to act on a specific, contained migration:

Migrate ResearchAction of the existing story-generation harness to CMA. That's the multi-turn codebase-exploration phase of app/Actions/Agent/StoryGenerationHarnessAction.php. The other 4 phases (ValidateInputAction, DuplicateCheckAction, ClassifyAction, WriteAction) are essentially structured-output calls that don't benefit from CMA — they should stay on Messages API. Expected outcomes:

Cost on the research phase drops 5-15× (conservative, after accounting for cache markers being available in either runtime)
Latency on the research phase ties or improves
Tool-call count drops by half — fewer round-trips, less load on the kendo backend's MCP tools
We get production CMA infrastructure stood up (Agent + Environment + github_repository resource + webhook handler + audit logging) on a real workload before betting the autonomous-execution feature on it

What this spike doesn't justify:

Building the full "Hand to Claude" autonomous-PR feature next. We've only validated the read side. The write side is still entirely untested in our context.
Migrating phases of the harness that are not tool-heavy. Those are the right shape for Messages API; CMA would just add session-creation overhead.

Reproduce

Spike folder: ~/Code/cma-spike/

bash

cd ~/Code/cma-spike
source .venv/bin/activate
# .env needs ANTHROPIC_API_KEY + GITHUB_TOKEN (read-only PAT scoped to script-development/kendo)
python3 cma/setup.py                           # creates Agent + Environment, caches IDs
python3 messages_api/run.py prompts/<file>.md  # one MA run
python3 cma/run.py prompts/<file>.md           # one CMA run
python3 compare.py                             # produces results/comparison.md

To extend the sample, drop more prompt files into prompts/ and re-run. Cost ceiling: each MA run ≈ $1-5; each CMA run ≈ $0.05-0.50 on Sonnet at the prompt sizes we tested.

To stop the meter when done: python3 cma/teardown.py (archives the Agent + Environment).

Prerequisite verification — kendo-script MCP transport (2026-05-09)

The capability survey flagged one prerequisite for any CMA migration that wants to attach the existing kendo-script MCP server (the laravel/mcp server backing https://script.kendo.dev/mcp/kendo) to a Managed Agents session: does it speak streamable HTTP MCP? Anthropic's mcp_toolset only accepts streamable-HTTP MCP servers — stdio is not supported, and pre-2025-03-26 separate-endpoint SSE transport is not supported. Verified now so the migration plan doesn't trip on a transport mismatch.

Verdict: ✅ streamable HTTP, OAuth-discoverable, Anthropic-compatible. No transport blocker.

Source verification

backend/routes/ai.php:15 registers the server via Mcp::web('/mcp/kendo', KendoServer::class). Tracing into laravel/mcp 0.6.6 (backend/composer.json pinned at ^0.6.6):

vendor/laravel/mcp/src/Server/Registrar.php:32-56 — web() registers a single route URI accepting:
- POST → wraps the request in HttpTransport, runs the server, returns either application/json or text/event-stream. Cites the MCP 2025-11-25 transport spec inline.
- GET → 405 Method Not Allowed with Allow: POST (server-initiated GET stream not supported — spec-permitted)
- DELETE → 405 Method Not Allowed with Allow: POST (session termination via DELETE not supported — also spec-permitted)
vendor/laravel/mcp/src/Server/Transport/HttpTransport.php — implements both the immediate-response branch and the SSE-upgrade branch on the same POST endpoint. Reads/writes the MCP-Session-Id header. Sets X-Accel-Buffering: no on streamed responses. Returns 202 for notifications-only POSTs, 200 otherwise — explicit reference to https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#sending-messages-to-the-server at line 70.

This is the streamable HTTP transport introduced in MCP spec 2025-03-26 and refined in 2025-06-18 / 2025-11-25 — single endpoint, POST-driven, optional SSE upgrade per request. Not the legacy two-endpoint SSE transport from 2024-11-05.

Live probe

Live endpoint behaviour matches the code (probed 2026-05-09 from this machine):

Request	Response	Notes
`GET https://script.kendo.dev/mcp/kendo`	`405` + `Allow: POST`	Spec-compliant.
`DELETE https://script.kendo.dev/mcp/kendo`	`405` + `Allow: POST`	Spec-compliant.
`POST` with no token, `Accept: application/json, text/event-stream`	`401` + `WWW-Authenticate: Bearer realm="mcp", resource_metadata="https://script.kendo.dev/.well-known/oauth-protected-resource/mcp/kendo"`	OAuth resource-indicator challenge (RFC 9728).
`GET /.well-known/oauth-protected-resource/mcp/kendo`	`{"resource":"https://script.kendo.dev/mcp/kendo","authorization_servers":["https://script.kendo.dev"],"scopes_supported":["mcp:use"]}`	RFC 9728 metadata.
`GET /.well-known/oauth-authorization-server`	`{issuer, authorization_endpoint, token_endpoint, registration_endpoint, response_types_supported:["code"], code_challenge_methods_supported:["S256"], scopes_supported:["mcp:use"], grant_types_supported:["authorization_code","refresh_token"]}`	RFC 8414 metadata. PKCE S256 + refresh tokens supported. Dynamic client registration available at `/oauth/register`.

This is precisely the shape Anthropic's mcp_oauth vault credential type consumes. Per the capability survey (section 8), POST /v1/vaults/{vault_id}/credentials with auth: { type: "mcp_oauth", mcp_server_url, access_token, expires_at, refresh: { token_endpoint, client_id, refresh_token, token_endpoint_auth } } would be filled directly from the metadata above — Anthropic handles refresh.

What this means for the recommended migration

The spike's recommended next step — migrate ResearchAction of StoryGenerationHarnessAction to CMA — does not need kendo-script MCP at all. Codebase research uses the agent_toolset_20260401 over the github_repository resource mount (which is what we benchmarked in this spike, and what gave us the 19× cost win). kendo-script MCP only enters the picture if/when we want CMA agents to read or mutate kendo issues, branches, time entries, etc. — i.e., for the bigger Application B ("Hand to Claude" autonomous PR) flow.

So this verification doesn't unblock the immediate next step, but it does eliminate the largest infrastructure unknown for Application B: we will not have to fork or rewrite the MCP server to make it Anthropic-compatible. We can declare it on the Agent and authenticate per-user via a vault.

Open follow-up

End-to-end auth flight. Discovery + 401 challenge are verified. The actual mcp_oauth round-trip — Anthropic exchanging an access token, calling tools/list, calling a tool, refreshing on expiry — has not been exercised. That's a small additional spike once we're scoping Application B; not required to unblock the ResearchAction migration.
mcp:use scope sufficiency. The auth server only advertises mcp:use. If we later want to differentiate read-only vs read-write CMA-driven access (e.g. a "Hand to Claude" feature that should only read but never mutate), we'll need a finer-grained scope set on the kendo-script side. Today, a token with mcp:use can call every tool the user has permission for.

Production-prompt benchmark (2026-05-09)

Per decisions § 3-5 below, ran the actual production ResearchAction system prompt against realistic story-gen inputs (prompts/research-action/* in ~/Code/cma-spike/). The benchmark hit harness reliability limits before completing the full 4-prompt × 2-path matrix, but prompt 1 produced a clean apples-to-apples Sonnet comparison that's already an order of magnitude past the pass/fail gate. Calling it.

Headline numbers (single clean Sonnet datapoint)

	Messages API	CMA	Ratio
Cost	$11.51	$0.56	20.7× cheaper
Wall time	483s (8 min)	180s (3 min)	2.7× faster
Tool calls	118	36	3.3× fewer
Cumulative input tokens	3.79M (no caching)	38in + 1.54M cache_read	—

The Messages-API per-prompt cost was 3.5× higher than the first spike's average ($3.24) — production-shape ResearchAction prompts are far more open-ended than synthetic codebase questions, and the no-cache agent loop balloons accordingly. That's exactly the workload shape where CMA's automatic prompt caching wins biggest. Same prompt on CMA cost $0.56 — the cache layer absorbed virtually all of the input.

Verdict

Migration green-lit. The cost gate (≥ 2× cheaper) passes by ~10× margin. The latency gate (CMA p95 ≤ 1.5× current) trivially passes — CMA is faster, not slower. Async-job nature of ResearchAction (queued, not interactive) makes latency mostly cosmetic anyway.

One datapoint isn't a tight measurement, but the gap is large enough that statistical noise can't close it.

Harness reliability findings (block on these before re-benchmarking)

The remaining 3 prompts surfaced multiple issues with the spike scaffold and Anthropic's session-control primitives. None block the migration decision, but they'd block any future benchmark that needs reliable per-prompt numbers:

Production-shape prompts run 5–10× longer than synthetic. First-spike prompts averaged 100–180s on CMA; production-shape ran 5–13 minutes. The session timeout in cma/run.py was 300s — patched to 600s mid-run. Future runs need ≥ 600s with tolerance for outliers.
user.interrupt doesn't reliably stop sessions mid-bash-tool-execution. One stuck session ignored 3 interrupts over 15s and stayed running until its agent's long bash command naturally returned ~13 minutes later. The interrupt seems to take effect only at the next model-loop boundary, not mid-tool-call. Patched the harness with drain-then-poll-for-idle but it still wasn't sufficient.
Reading session.usage too early returns 0/0/0/0. The harness fetched usage immediately after the stream closed; this returns zeros for sessions still finalising. Patched to poll for idle status (up to 40s) before reading. Without this fix, ~$1 of orphan-session spend went unattributed.
The full agent_toolset_20260401 is too broad to faithfully simulate ResearchAction. Production ResearchAction has 3 read-only tools (get_repo_tree, search_code, get_file_content); the spike's agent had bash + read + write + edit + grep + glob + (originally) web_fetch + web_search. On prompt 2, the agent ignored "your ONLY job is to explore" and made 35 edit calls patching 16 files (a full ValidationException-passthrough fix across all create/update MCP tools). Cost ($2.96) and behaviour (write-side) both diverged from what production would do. No git push was attempted — verified via the bash call log; the 16-file fix lived only inside the ephemeral container and was reaped on archive. For an honest production-prompt benchmark, the agent must be created with default_config.enabled: false + an explicit allowlist (read, grep, glob only).

Spend tally

Run	Cost	Notes
Messages API · Sonnet · prompt 1	$11.51	Full completion, `end_turn` — the headline datapoint
CMA · Sonnet · prompt 1	$0.56	Full completion, `end_turn` — the headline datapoint
CMA · Sonnet · prompt 2 (off-script edit campaign)	~$2.96	Agent ignored research-only system prompt, did the full fix
CMA · Sonnet · orphan-session cleanup	~$1.06	Recovered via post-hoc usage queries on archived sessions
Total	~$16	~$11 of which was the Messages-API run that this whole exercise plans to eliminate

What this changes for the migration plan

When /plan-feature runs for the ResearchAction migration:

Don't re-run the production-prompt benchmark unless the harness gets the fixes above. The migration decision doesn't need more data — prompt 1 is sufficient.
The migration's ManagedAgentsService should mirror production ResearchAction's read-only tool surface — wrap get_repo_tree, search_code, get_file_content as MCP tools or custom tools, and create the CMA Agent with default_config.enabled: false + explicit allowlist. Easier to reason about, cheaper per-run, predictable.
The user.interrupt unreliability becomes a production constraint: long-running CMA sessions need server-side deadlines + spend budgets in ManagedAgentsService, not just client-side timeouts. Plan for "the session may run for 15 minutes after we tell it to stop." For ResearchAction specifically: bound the work via Anthropic's outcome rubric (max_iterations) rather than relying on interrupts.

Decisions made (2026-05-09)

After the prerequisite verifications below, a planning round between CEO and parent agent settled the path forward:

Migrate ResearchAction only (slice 2 of capability survey § 12 Application C). Other 4 phases stay on Messages API.
No Pennant flag. ResearchAction is internal infra. Pennant is reserved for HandOffToClaude (Application B, the user-visible feature).
Pre-migration benchmark spike: extend ~/Code/cma-spike/ with a real-prompt file (actual ResearchAction system prompt + 3-5 sampled story-gen inputs), re-run compare.py. Closes cost+latency uncertainty on the real prompt shape, not the 3 synthetic prompts this spike used.
Pass/fail gate: ship only if cost ≥ 2× cheaper and p95 latency ≤ 1.5× current ResearchAction. Conservative on cost (we saw 19× lab-side; 2× in prod gives margin); forgiving on latency (story-gen is async-job, not interactive UX).
Rollback strategy: replace outright; trust the benchmark as the gate; rollback via git revert + redeploy if needed. No shadow mode, no A/B compare in code.
Outcomes + multiagent: confirmed public beta as of 2026-05-06 (Code with Claude 2026) — no access form needed; remove that prerequisite from the open list.
Application B (Hand to Claude): park until ResearchAction has shipped and run in production for ≥ 2 weeks. Then /plan-feature it with battle-tested CMA infra.

Full table with rationale lives in the capability survey at ./managed-agents-kendo-evaluation.md § 16.

Prerequisite verification — `github_repository` mount auth (2026-05-09)

The spike used a hand-rolled fine-grained PAT on the github_repository resource. For production, the mount needs to authenticate as kendo, not as a developer's PAT. Question we wanted to settle: can we reuse the existing GitHub App installation that kendo already has wired up, or do we need to provision a separate token per linked repo?

Verdict: ✅ reuse the existing installation. No per-repo key.

Source verification

Kendo's GitHub integration uses a per-tenant GitHub App installation:

app/Models/Central/GithubInstallation.php — central-DB row mapping installation_id ↔ tenant_id. One row per tenant; the App is installed account-wide for that tenant.
app/Models/ProjectGithubRepo.php — per-tenant row holding repo_full_name (e.g. "owner/repo") linked to a project. Anything in this table is reachable via the tenant's installation token.
app/Services/GithubAppService.php:30 — getInstallationToken(int $installationId): string mints a fresh 1-hour installation access token by JWT-signing a call to POST /app/installations/{id}/access_tokens. The token authorises every repo the installation has access to — not per-repo.

The App's permission set is provable from existing usage in GithubAppService — it issues check runs (createCheckRun), PR comments (createPrComment), and dispatches workflows. That implies at minimum pull_requests: write and contents: write, both covered by repo scope on Anthropic's CMA token-permission table (capability survey § 9: clone private repos = repo, create PRs = repo).

CMA wiring

php

// In ManagedAgentsService::createSession() or equivalent
$installation = GithubInstallation::where('tenant_id', $tenant->id)->firstOrFail();
$token = $githubAppService->getInstallationToken($installation->installation_id);

$payload = [
    'agent' => $agentId,
    'environment_id' => $envId,
    'resources' => [
        [
            'type' => 'github_repository',
            'url' => "https://github.com/{$projectRepo->repo_full_name}",
            'mount_path' => '/workspace/repo',
            'authorization_token' => $token,  // 1-hour TTL
        ],
        // multi-repo: same $token, different url / mount_path per entry
    ],
];

Two real caveats (neither is a per-repo-key problem)

Token TTL is 1 hour. Anthropic supports PATCH /v1/sessions/{session_id}/resources/{resource_id} to rotate mid-session — capability survey § 9 calls this out. Long-running sessions (story-gen ResearchAction is short-lived; Application B "Hand to Claude" is potentially multi-hour) need a refresh job that re-mints via getInstallationToken() before expiry.
App must have the repo selected. Already enforced upstream by ProjectGithubRepo (the user picked which repos to grant at install/configure time) — UX precondition, no new infra.

What this means for the migrations

ResearchAction migration (recommended next step): each session is short (single research phase, well under 1 hour). Mint once at session-create, no rotation needed.
Application B (Hand to Claude): sessions can run hours. Token rotation is required infrastructure — tied to the long-session lifecycle, alongside outcome evaluation and webhook handling.

References

The capability survey written before this spike: ./managed-agents-kendo-evaluation.md — covers all 14 Anthropic doc pages, three plausible kendo applications, decision framework
Anthropic Managed Agents docs: https://platform.claude.com/docs/en/managed-agents/overview
Spike scaffold (local): ~/Code/cma-spike/
Spike Agent/Environment IDs (cached locally, archive when done): ~/Code/cma-spike/.cache.json
Closest existing kendo precedent: backend/app/Actions/Agent/StoryGenerationHarnessAction.php (5-phase harness; ResearchAction is the migration target)

Managed Agents Spike — Codebase Research, CMA vs Messages API (3 prompts) ​

Summary ​

Spike setup ​

Verbatim numbers ​

Per-prompt observations ​

Findings ​

Honest limitations ​

Strategic read ​

Reproduce ​

Prerequisite verification — kendo-script MCP transport (2026-05-09) ​

Source verification ​

Live probe ​

What this means for the recommended migration ​

Open follow-up ​

Production-prompt benchmark (2026-05-09) ​

Headline numbers (single clean Sonnet datapoint) ​

Verdict ​

Harness reliability findings (block on these before re-benchmarking) ​

Spend tally ​

What this changes for the migration plan ​

Decisions made (2026-05-09) ​

Prerequisite verification — github_repository mount auth (2026-05-09) ​

Source verification ​

CMA wiring ​

Two real caveats (neither is a per-repo-key problem) ​

What this means for the migrations ​

References ​

Managed Agents Spike — Codebase Research, CMA vs Messages API (3 prompts)

Summary

Spike setup

Verbatim numbers

Per-prompt observations

Findings

Honest limitations

Strategic read

Reproduce

Prerequisite verification — kendo-script MCP transport (2026-05-09)

Source verification

Live probe

What this means for the recommended migration

Open follow-up

Production-prompt benchmark (2026-05-09)

Headline numbers (single clean Sonnet datapoint)

Verdict

Harness reliability findings (block on these before re-benchmarking)

Spend tally

What this changes for the migration plan

Decisions made (2026-05-09)

Prerequisite verification — `github_repository` mount auth (2026-05-09)

Source verification

CMA wiring

Two real caveats (neither is a per-repo-key problem)

What this means for the migrations

References