Skip to content

AI Features Decisions

Distilled from 14 DECISIONS.md files. Each entry is a real fork in the road. Implementation details that aren't a tradeoff are NOT here.


GitHub auth for agent tools — always use App installation token, never user OAuth

  • Chose: Always resolve a GitHub App installation token in AgentToolFactory::forProject().
  • Rejected: Two-tier fallback that prefers the user's OAuth token and falls back to the App token.
  • Why: Agent repo access is a project-level operation, not user-level — coupling it to a single user's GitHub connection is fragile and unnecessary. App token has equal or better access for read operations.
  • Source: KD-0277

Exclude GitHub tools when no installation exists

  • Chose: Drop GitHub tools entirely from the agent toolset when no installation token is resolvable.
  • Rejected: Include the tools and let them fail at call time.
  • Why: Agents waste tokens probing tools that are guaranteed to 401; surfacing zero tools is cleaner than surfacing eight broken ones.
  • Source: KD-0277

Story generation pipeline — remove Quick Mode, route everything through the multi-agent path

  • Chose: Single intelligent path (Dedup → Research → Classify → Write).
  • Rejected: Keep Quick Mode as a simple-prompt LLM shortcut alongside Agent Mode.
  • Why: Even short prompts benefit from duplicate detection, intent classification and codebase awareness; the speed gap (~30s vs ~5s) is acceptable for the quality lift.
  • Source: KD-0355

Pipeline shape — 4 narrow agents, not 3 fat ones

  • Chose: Dedup, Research, Classify, Write — each tool-scoped to one job.
  • Rejected: Collapse Classify and Research into one agent that classifies, picks templates, gathers code context, and decides priority.
  • Why: A fat Classify agent recreates the monolithic prompt the rewrite was meant to kill. Each focused agent gets a short prompt and a single responsibility.
  • Source: KD-0355

Templates picked by the agent, no type column on templates

  • Chose: Pass all project templates to the Classify agent and let it choose.
  • Rejected: Add a type column on the templates table for deterministic matching.
  • Why: Hardcoded type matching can't handle custom categories like Spike or Refactor; agent-side selection lets projects define arbitrary template sets without schema work.
  • Source: KD-0355

Always run Research, even when Dedup found duplicates

  • Chose: Research is non-skippable — codebase context flows into Classify on every run.
  • Rejected: Skip Research when Dedup returns duplicates to save one LLM call.
  • Why: Classify needs technical grounding to distinguish "true duplicate" (intent: update) from "similar but distinct" (intent: create_feature). The 5-15s saved isn't worth losing that signal.
  • Source: KD-0355

Add a Validate step (Step 0) before Dedup

  • Chose: Lightweight AI pre-check that short-circuits on obviously vague prompts before the expensive agents fire.
  • Rejected: Heuristic word-count check, OR rely solely on Classify's respond intent path.
  • Why: Heuristics flag valid 3-word prompts ("fix login bug"); leaving it to Classify wastes 10-30s on Dedup+Research API calls for garbage input. A 1-2s validator pays for itself within the first vague request.
  • Source: KD-0399

Pipeline orchestration class is an Action, not a Service

  • Chose: AiGenerationPipelineAction lives in Actions/Ai/ with execute().
  • Rejected: AiGenerationPipeline Service class.
  • Why: Deptrac layer rules forbid Services from depending on Audit (AiOutboundLogger) and Actions (ResolveByokProviderAction). Stretching the "Action" semantic is cheaper than weakening layer boundaries app-wide.
  • Source: KD-0355

Progress events broadcast on the user-private channel, not project-scoped

  • Chose: Reuse the Tenant.{id}.User.{id} private channel.
  • Rejected: New project-scoped channel so the team sees who's generating.
  • Why: AI generation is a personal action — team visibility adds channel-auth and frontend complexity for no clear UX win.
  • Source: KD-0399

Stepper enums are int-backed, not string-backed

  • Chose: Numeric values for AgentStepPhaseEnum / AgentStepStatusEnum.
  • Rejected: Human-readable string values for easier DevTools debugging.
  • Why: Frontend already standardises on int-backed enums (priority, type, intent); WebSocket payloads fire frequently and benefit from compactness.
  • Source: KD-0399

Bot/AiRun removal — clean sweep, single drop migration

  • Chose: One new migration that nullifies FKs, deletes the claude bot user, drops is_bot, ai_runs, ai_run_logs. Old migration files removed.
  • Rejected: Only remove items the issue lists; OR keep old migrations for history.
  • Why: Partial removal leaves orphaned imports and failing arch tests; keeping old migrations alongside drops creates dead code in the migrations directory.
  • Source: KD-0390

AI-key access lookup — dedicated read-side Action, not a service or inline duplication

  • Chose: Final readonly ResolveProjectAiKeyAccessAction returning a project_id → bool map, called once per request from controllers.
  • Rejected: Plain ProjectAiKeyAccessResolver service in app/Services/, or duplicating the TenantAiKey lookup inline at all 6 controller call sites.
  • Why: Action-default is already the codebase's strong convention and inherits the existing arch-test discipline; inventing a new "resolver service" layer for one feature, or duplicating 6×, both lose the "one query per request" invariant we want to assert in tests.
  • Source: KD-0610

ResourceData drift fix — tripwire from() on the subclass, don't widen the abstract base

  • Chose: New fromWithAccess / fromWithViewContext factories on the affected resources; base from(Model) overridden to throw BadMethodCallException.
  • Rejected: Widen the abstract ResourceData::from(Model, ...$context = []) so every subclass takes optional context.
  • Why: Widening the base would touch ~22 subclasses and drift the abstraction toward a generic projection-DTO base; a fail-loud tripwire on just the affected resources turns "forgot the new factory" into an error at the entry point rather than a silent has_ai_key=false regression.
  • Source: KD-0610

ResourceData arch test — substring-scan banned static query patterns

  • Chose: tests/Arch/ResourcesTest.php extended with a broad substring scan for ::query(, ::first(, ::all(, ::find( in every file under app/Http/Resources/**.
  • Rejected: Reflection-walk that only inspects from() / __construct() bodies on each ResourceData subclass.
  • Why: Static entry points are exactly what the rule wants to forbid and the patterns won't false-positive on instance calls ($relation->first()) or constants; reflection lets drift sneak into named static helpers, which is exactly how the lookup leaked into resources in the first place.
  • Source: KD-0610

Attachment vision cache — separate attachment_extracted_contexts table, not a column on attachments

  • Chose: New table keyed by (attachment_id, model_version), joined only when the harness runs.
  • Rejected: Nullable extracted_text TEXT column on the existing attachments table.
  • Why: Every hot-path query for attachments (board, attachment grid, MCP list) would haul TEXT bytes 99% of callers don't need; the composite key also lets model upgrades (Sonnet 4.6 → 4.7) create a new row without losing the prior cache for audit.
  • Source: KD-0649

Vision/text extraction placement — new pipeline step between Validate and Duplicate Check

  • Chose: Add a sixth pipeline step ("Extract Context") that runs after Validate and before Duplicate Check.
  • Rejected: Eager extraction at upload time inside CreateAttachmentAction, or lazy extraction inside ResearchAction.
  • Why: Upload-time extraction pays vision cost for attachments that may never feed the harness and couples uploads to AI infra; lazy-inside-Research means the extracted text never reaches Duplicate Check, killing the screenshot-dedup benefit. Putting it before Duplicate Check gives both consumers (dedup + research) the text from a single extraction.
  • Source: KD-0649

Vision provider — reuse the project's BYOK-resolved provider, no dedicated cheap-vision model

  • Chose: Vision call routes through the existing AiGenerationPipelineAction with feature: 'agent_attachment_vision', using whatever provider the project resolved.
  • Rejected: Always platform Anthropic (ignoring BYOK), or a dedicated cheap vision model (e.g. Haiku) with its own BYOK resolution path.
  • Why: Silently using platform credentials when the customer set up BYOK breaks the billing promise; carving out a separate cheap-model path doubles the BYOK resolution surface for a token-cost concern we haven't observed in production yet.
  • Source: KD-0649

Attachment selection — snapshot IDs at click-time, not live-read mid-run

  • Chose: Request payload pins attachment_ids[] at click-time; harness is a pure function of (description, project, attachment_ids, user).
  • Rejected: Re-read the attachment list at extraction time so newly uploaded images "just work" mid-run.
  • Why: Live reads need attachment-locking and time-windowing logic, and would surprise users by extracting (and charging vision tokens for) screenshots uploaded after they clicked Generate. Explicit "click again to include" is predictable and race-free.
  • Source: KD-0649

Per-attachment vision failure — soft-skip with Warning status, not hard-fail

  • Chose: Record per-attachment failure, continue pipeline with the succeeded attachments, mark the Extract-Context step Warning.
  • Rejected: Abort the whole harness on any vision failure, or silently skip without surfacing the failure.
  • Why: A single flaky vision call shouldn't kill the whole flow when the rest of the input is usable, but silent skips leave users wondering why the AI didn't see a screenshot — Warning status surfaces the dropped image without blocking the story.
  • Source: KD-0649

Vision cache invalidation — composite (attachment_id, model_version), no TTL

  • Chose: Unique index on (attachment_id, model_version); never invalidate within a model version.
  • Rejected: Time-based TTL (e.g. 30 days) on cache rows, or invalidate on Attachment model update.
  • Why: Attachments are immutable (replace = new row in CreateAttachmentAction), so bytes can't change — the only legitimate invalidator is model upgrade, which the composite key handles cleanly. A TTL just burns tokens re-extracting unchanged bytes.
  • Source: KD-0649

Generalise "Vision step" to "Attachment context step" (images + text) in v1

  • Chose: Single MIME-dispatched ExtractAttachmentContextAction — image MIMEs hit the vision path with cache; text MIMEs (text/*, application/json, application/xml) read bytes directly from Filesystem with no AI call and no cache row.
  • Rejected: Ship vision-only first and defer text attachments to a follow-up ticket.
  • Why: Baking "vision" into the Action name, table name, feature key, and step label now means renaming all of it later; the text branch reuses everything we already designed (sanitation, selector UX, output tags) and adds zero token cost since it's just a Filesystem::read.
  • Source: KD-0649

CMA client shape — protocol primitives only, drop runSessionToCompletion

  • Chose: ManagedAgentsClient exposes exactly three primitives — createSession, getSession, streamSessionEvents — that callers compose for both synchronous and detach-and-resume flows.
  • Rejected: Keep the cherry-picked runSessionToCompletion per-use-case method alongside the new primitives (the original D1 choice before module-shape re-application).
  • Why: The module-shape lens fired on two of three tests — method name was the use case, and the second consumer (Hand-to-Claude) needed new primitives not new parameters. Shipping the per-use-case method alongside the primitives would lock in the shallow-and-suspect shape KD-0650's first draft was meant to retire.
  • Source: KD-0658

Hand-to-Claude agent topology — multiagent v1 (coordinator + planner + implementer + reviewer), not single-agent

  • Chose: Coordinator declares multiagent.type with planner, implementer, and reviewer sub-agents (4 Agent IDs, depth-1, per-role tool surfaces).
  • Rejected: Single-agent v1 mirroring KD-0650, with multiagent deferred to a PLAN-027 follow-up.
  • Why: CEO accepted the higher provisioning surface (4 Agent IDs, 4 system prompts, 4 Console-Agent records) for the quality ceiling of an "agent reviews its own work mid-session" loop; falls back to single-agent if authoring friction or Anthropic-side multiagent issues block ship.
  • Source: KD-0658

Hand-to-Claude auth — inline GH App installation token, no vault in v1

  • Chose: Mint a fresh GH App installation token in HandIssueToClaudeAction and inline it in resources[].authorization_token; agent uses gh CLI authed via GH_TOKEN from the cloned credentials.
  • Rejected: GitHub MCP server (api.githubcopilot.com/mcp/) for create_pull_request with a per-tenant static-bearer vault holding the installation token (Anthropic's docs-canonical path).
  • Why: The MCP path forces vault infra into v1 — one vault per tenant, refresh-failed webhook handling, vault model — for the marginal reliability gain over gh pr create. v1 ships with the smallest auth surface; sessions over 1 h fail with a diagnostic comment until rotation is added later.
  • Source: KD-0658

Hand-to-Claude sidebar — separate ClaudeAgentSidebar.vue, not bolted onto IssueSidebar.vue

  • Chose: New sibling component mounted next to the existing sidebar, gated by useFeatureActive('handToClaude').
  • Rejected: Add the Analyse/Hand buttons + criteria-grading card directly to IssueSidebar.vue.
  • Why: Per-criterion grading needs real estate that would crowd the existing sidebar; a separate component lets the whole feature toggle on/off with one v-if and gets redesigned without touching the main sidebar.
  • Source: KD-0658

Eligibility analyzer — cache score on issues table, invalidate on body/AC change

  • Chose: Compute on user click via RunIssueEligibilityCheckAction, persist claude_eligibility_score + claude_eligibility_reason columns on the issue, model observer nulls them when description/prompt is dirty.
  • Rejected: Re-run the analyzer on every button-click, or pre-compute via observer on every issue create/update.
  • Why: No-cache wastes $0.02 per press for an answer that didn't change; observer-on-every-write penalises tenants who mass-edit issues with an AI call per save. Cache-and-invalidate-on-change hits the right cost curve.
  • Source: KD-0658

One in-flight ClaudeSession per issue — DB-level guard via generated column + unique index

  • Chose: Generated in_flight TINYINT derived from status IN (Pending, Running) + unique index on (issue_id, in_flight); concurrent inserts trigger UniqueConstraintViolationException which the Action catches and returns 409.
  • Rejected: Application-level check inside HandIssueToClaudeAction (SELECT then INSERT), or SELECT … FOR UPDATE pessimistic lock in a transaction.
  • Why: App-level checks have a tight race window between the SELECT and the INSERT that two simultaneous page loads will hit; pessimistic locks add DB contention. A unique index is self-documenting and atomic.
  • Source: KD-0658

GH installation token rotation — defer to follow-up, sessions > 1 h fail with diagnostic comment

  • Chose: No rotation in v1; webhook handler detects token-expiry-shaped failures and posts a "session exceeded 1 h" comment with retry guidance.
  • Rejected: Periodic-rotate scheduled job that PATCHes a fresh token onto every running ClaudeSession every ~50 min, or capping sessions via outcome rubric so they can never exceed 55 min.
  • Why: Rotation has measurable code surface for a failure mode we haven't observed yet; capping via rubric trades a real failure mode for a guess. Defer until production data on actual session durations exists.
  • Source: KD-0658

Anthropic webhook intake — mirror the GitHub webhook trio (Controller + Middleware + Job)

  • Chose: Dedicated AnthropicWebhookController + VerifyAnthropicWebhook middleware + ProcessAnthropicSessionWebhookJob, each cloned in shape from the GH equivalents.
  • Rejected: Reuse GithubWebhookController for the Anthropic events, or handle synchronously without a Job.
  • Why: Different signature scheme, payload shape, and 5-min freshness rule mean smashing both into one controller hurts both; the freshness rule is too tight to risk synchronous handling on heavy work.
  • Source: KD-0658

ClaudeSession history shape — append-only with attempt_number, drop the singleton

  • Chose: Drop the unique partial index on (issue_id, in_flight); add attempt_number int default 1 with unique (issue_id, attempt_number); keep every press as its own row.
  • Rejected: Keep the singleton and wipe the prior row on retry (destructive Redo button), or soft-delete prior rows and insert new.
  • Why: Singleton + destructive retry kills the audit trail the feature exists to preserve; soft-delete bolts deleted-at machinery onto an audit-only table when the codebase's canonical pattern for this shape (ai_outbound_logs, *AuditLog) is append-only.
  • Source: KD-0685

Persist Anthropic vault_id on the session row

  • Chose: New nullable vault_id column on claude_sessions, written alongside anthropic_session_id when the press provisions the vault.
  • Rejected: Enumerate workspace vaults at cleanup time and match by display-name convention ('kendo hand-to-claude session').
  • Why: Cleanup runs hours later from a webhook and a day later from the prune backstop, both with no in-memory link to the minted vault; display-name matching is O(N) per cleanup and not unique-per-session. One extra column write makes cleanup O(1) and avoids brittle name parsing.
  • Source: KD-0685

Atomic attempt_number guarantee — unique index, not SELECT … FOR UPDATE

  • Chose: Unique index on (issue_id, attempt_number); losing race throws UniqueConstraintViolationException which the Action catches and returns 409.
  • Rejected: SELECT … FOR UPDATE on prior sessions inside a transaction, or app-level pre-check with no DB-enforced invariant.
  • Why: The catch-and-409 shape is already proven from the old singleton path; the unique index is cheaper than DB locks and self-documents the "gap-free attempts per issue" invariant in schema, which app-level guards can't.
  • Source: KD-0685

Vault cleanup — vaults->archive, not vaults->delete

  • Chose: Wrap vaults->archive() in a new ManagedAgentsClient::archiveVault primitive, sibling to the existing discardVault (which archives orphaned vaults).
  • Rejected: Add a new vaults->delete primitive to honour the issue's literal "delete" wording.
  • Why: Both calls free the secret and the installation token expires in <1h regardless — only the lifecycle row differs, and the rest of the codebase consistently archives for audit. Diverging from issue wording is cheaper than diverging from the codebase pattern.
  • Source: KD-0685

Stuck-session prune — call discardOrphanedSession before updating our DB

  • Chose: Prune backstop detects → discardOrphanedSession($anthropicSessionId) → set kendo status Terminated → run archive cleanup.
  • Rejected: Only update our DB status and rely on Anthropic's own 1h/24h limits to eventually terminate the runaway agent.
  • Why: Without the explicit terminate we keep paying for an agent that's effectively orphaned until Anthropic's natural limits fire; the existing primitive already swallows + logs Anthropic-side errors so the prune never blocks on it.
  • Source: KD-0685

In-flight predicate — Pending + Running only, no new RequiresAction enum case

  • Chose: whereIn('status', [Pending, Running]) as the in-flight set; the issue's requires_action wording is flagged as loose.
  • Rejected: Add a fifth RequiresAction enum case before Completed, with matching webhook handling.
  • Why: The Anthropic SDK doesn't emit a requires_action status today — adding the enum case ships a kendo status with no producer. Wait until Anthropic adds a webhook event for human-approval transitions before modelling one.
  • Source: KD-0685

"No events in last hour" measurement — streamSessionEvents + latest occurredAt, not updated_at

  • Chose: Per-tick SDK call to streamSessionEvents($anthropicSessionId), take latest occurredAt, compare to now() - 1h.
  • Rejected: Use ClaudeSession.updated_at (our DB column) as a proxy for last activity.
  • Why: Our updated_at only moves on webhook events, so an agent working silently between webhook-worthy events would falsely trip the prune. The SDK call is O(in-flight running sessions per tick), bounded and tolerable in v1.
  • Source: KD-0685

Prune tenant iteration — walk the central claude_session_tenants index, not Tenant::all()

  • Chose: ClaudeSessionTenant::query()->select('tenant_id')->distinct()->cursor() then switchTo each in turn.
  • Rejected: Walk every tenant in Tenant::all() and switch through to query claude_sessions on each.
  • Why: The full walk visits tenants that have never used Hand-to-Claude (most of them in early rollout); the central index bounds the scan to tenants that have ever had a press, which is a tiny subset and mirrors the existing ResolveTenantForSessionAction pattern.
  • Source: KD-0685

Branch naming for retry attempts — kendo generates the slug, not the agent

  • Chose: StartNewClaudeSessionAction computes a kebab-slug from the issue title and embeds the full branch name (feat/KD-XXXX-<slug> for attempt 1, feat/KD-XXXX-<slug>-<N> for attempt N≥2) in the kickoff message; the implementer's system prompt is updated to use the provided name verbatim.
  • Rejected: Let the implementer pick the slug itself (the existing convention) and have kendo append -<N> after the fact.
  • Why: There's no "after the fact" — the agent uses the slug directly when opening the PR, so kendo can't append anything. Kendo-side slug computation also makes branch names deterministic, so a human reading the issue can predict the branch.
  • Source: KD-0685

Missing-evidence response — format-bounce capped at 1, separate from the 3-cycle review budget

  • Chose: A first missing-evidence reply gets bounced once ("include the tool response"); a second consecutive miss fails the session, all separate from the 3-cycle review cap.
  • Rejected: Consume one of the 3 review cycles per format miss, or hard-fail immediately on the first miss.
  • Why: Counting format mistakes against the review budget lets them starve the substantive review; hard-fail terminates a session that a transient format slip would otherwise have shipped — one bounce mirrors the existing malformed-verdict handling.
  • Source: KD-0694

Reviewer quote-rule — allow VERIFIED-BY-ABSENCE for negative ACs, capped at 1 per review

  • Chose: Let the reviewer mark a no-regression / "still works" AC with VERIFIED-BY-ABSENCE: <reason> instead of a quoted diff line, capped at one per review.
  • Rejected: Force the reviewer to quote a related context line that demonstrates the unchanged property.
  • Why: A negative assertion has no +/- line by definition, so a strict quote rule forces the model to invent a fake quote or downgrade a legitimate APPROVE; the cap stops it from absence-verifying every AC to short-circuit the rule.
  • Source: KD-0694

Drop the wrong-org URL host-prefix guard — rely on the verbatim tool-response rule instead

  • Chose: Remove the coordinator's hardcoded github.com/script-development/kendo URL check and lean on the requirement that the implementer paste the verbatim create_pull_request/update_pull_request response.
  • Rejected: Keep the hardcoded-org prefix check, or thread the expected repo_full_name into the kickoff message for a dynamic check.
  • Why: A successful PR tool call is bound to the cloned repo by construction, so the verbatim-response rule already guarantees the right URL; hardcoding one org would footgun the moment Hand-to-Claude rolls out to a second tenant, since the repo is resolved per-project.
  • Source: KD-0694

Hand-to-Claude surface — a tab next to Activity, not a widened right rail or a modal

  • Chose: Promote the feature to a full-content-width tab alongside Activity, deleting the cramped right-rail sidebar.
  • Rejected: Widen the right rail to a fixed w-96, or open the whole feature in a button-triggered modal.
  • Why: The rail can't fit per-criterion grading, live progress, and future run/step logs without crowding the description; a modal isn't ambient and has no precedent in the UI, whereas a tab reuses the existing tab-strip pattern and ~triples horizontal space.
  • Source: KD-0702

Delete the old surface in the same PR — no coexistence period behind a sub-flag

  • Chose: Delete the legacy sidebar in the same PR that adds the new tab.
  • Rejected: Keep both mounted behind a sub-flag until adoption is confirmed, then delete the sidebar in a follow-up.
  • Why: Coexistence risks "the old surface still works because we forgot to delete it" follow-up debt and doubles the test surface, whereas a single-move migration lets reviewers compare both surfaces in one diff.
  • Source: KD-0702

Tab content mounts/unmounts on switch — accept the re-fetch cost, don't KeepAlive or hoist

  • Chose: Let the tab destroy and recreate on every switch, re-firing its three GETs and Echo subscribe each activation.
  • Rejected: Hoist the composable to the parent page and pass refs down, or wrap the tab in <KeepAlive>.
  • Why: Tab switching is rare in practice, so a few small GETs per activation is acceptable; the contract is small enough to refactor to a hoisted instance later if measurements ever show switching is hot.
  • Source: KD-0702

Operational telemetry columns stay OUT of the audit hash payload

  • Chose: Write the new outcome_verdict to every row but exclude it from the SHA-256 chain payload.
  • Rejected: Include the verdict in the hash because it's "audit-meaningful."
  • Why: Telemetry that discriminates flavours-of-close (not whether the row was tampered with) must stay out of the payload so rows written before the column existed still verify with the same algorithm — direct precedent set by the cache-token columns.
  • Source: KD-0704

No backfill of new audit columns — clean cutover, leave historical rows NULL

  • Chose: Drop the backfill entirely; only rows written after the migration carry the verdict, old rows stay NULL.
  • Rejected: A backfill migration (or artisan command, or schema FK) joining old log rows to their source session by a time-window match.
  • Why: Audit-log tables are append-only by invariant, so reaching into already-written rows is exactly the operation the invariant forbids even via raw SQL; the source could recover only 3 of 9 verdicts anyway, and those 3 aren't the ones operations most needs.
  • Source: KD-0704

Board "handed to Claude" signal — denormalised timestamp column, not a per-request withExists bool

  • Chose: Store handed_to_claude_at on issues, set once on first hand-off, backfilled from MIN(session.created_at).
  • Rejected: Compute a has_claude_session bool per request via a correlated withExists subquery on every list fetch.
  • Why: The timestamp is zero per-request cost (column on the parent row), carries when-not-just-if, survives any future session-row cleanup, and is sortable/tooltipable — whereas the subquery pays a unique-index dive per issue on every board load for a single bit.
  • Source: KD-0706

"Has been handed" set-once semantic over a live "in-flight" indicator

  • Chose: The board annotation appears on first hand-off and stays forever, a pure read of handed_to_claude_at IS NOT NULL.
  • Rejected: An in-flight-only indicator that renders while a session is Pending/Running and flips off on terminal status.
  • Why: In-flight needs a relation scope, a terminal-broadcast hook, and reactive wiring, yet forgets the most durable fact (that Claude touched the issue) the moment a session ends — and the live status already lives in the tab, so the board needn't duplicate it.
  • Source: KD-0706

  • Chose: Let the existing push-webhook auto-link the branch ~1–2 min after the press; no eager link.
  • Rejected: Create an IssueBranchLink with Pending status immediately at press, flipped to Linked by the webhook.
  • Why: Eager creation buys a 1–2 minute UX win at the cost of orphan-link cleanup on failure, dedup against the auto-link re-fire, and a Pending status badge — three edge cases the existing "handed to Claude" indicator already papers over during the gap.
  • Source: KD-0706

Grader feedback rendered only on failure verdicts, omitted on satisfied

  • Chose: Include the grader's explanation only when the outcome is needs_revision / failed / max_iterations_reached.
  • Rejected: Render the explanation on every outcome where it's non-null, including the "why criteria were met" text on satisfied.
  • Why: On a satisfied verdict the grader's reasoning is redundant with the verdict and the agent's own summary, so surfacing it only on failures keeps the tight "why did it fail" signal the developer actually needs.
  • Source: KD-0707

Seed only feature-state tables, never the hash-chained audit log

  • Chose: Seed claude_sessions + webhook events + the central index, leaving ai_outbound_logs audit rows out of the seeder.
  • Rejected: Seed the audit rows too so a future cost-per-session widget has dev data.
  • Why: Audit rows are hash-chained (each depends on the prior row's hash) and runtime artifacts everywhere else, so re-implementing the chain in a seeder couples seeding to audit infra for data no UI consumes yet.
  • Source: KD-0709

Real demo data via a hardcoded scrubbed snapshot, not a live API-fetch seed command

  • Chose: Capture one real session's stats once during implementation into a constant the seeder anchors on, with a comment naming the refresh recipe.
  • Rejected: A dev:seed-from-cma-session artisan command that fetches the Anthropic API at seed time.
  • Why: A live fetch adds an external HTTP dependency (and an API-key requirement) to migrate:fresh --seed that risks CI flake and pulls anonymisation scope, whereas a one-time captured snapshot gives realistic data with zero runtime dependency.
  • Source: KD-0709

AI billing failures get HTTP 402, not the parent's 502

  • Chose: Give the insufficient-credit exception a 402 Payment Required status, diverging from the sibling provider exception's 502.
  • Rejected: Reuse 502 Bad Gateway for sibling-consistency, or fall back to the 500 default.
  • Why: Kendo's convention is a semantically-precise status per exception, and 402 lets an external observer scanning by code distinguish "the tenant's billing is short" from "the provider is broken" — verified not to collide with the existing 402 plan-limit middleware, which checks body shape.
  • Source: KD-0784

Don't thread previous: on the new exception — match siblings and dodge the Rector self-revert

  • Chose: Throw the exception plain, with no previous: chaining, matching the four existing match-arm siblings.
  • Rejected: Thread previous: $throwable to preserve the stack-trace chain.
  • Why: The throwing file isn't in the Rector skip list, so a preset would rewrite a previous: call into a message-passthrough that the throwable-leak arch test bans — and the original is already captured in ai_outbound_logs.error_message before the match runs, so the chain is recoverable anyway.
  • Source: KD-0784

Hand-to-Claude review — platform Outcomes grader, not a second reviewer agent/session

  • Chose: Route all review through the single implementer session and the platform grader (define_outcome against a rubric encoding the reviewer criteria).
  • Rejected: A separate reviewer agent in its own container/session, kicked off after the implementer idles.
  • Why: A second LLM keeps the reviewer-fabricates-APPROVE failure class alive and adds a ~30–60s container boot plus state-machine code per cycle, whereas the grader's isolated context is the structural fix for that class — its private-repo blindness is workable via a self-packaged artifact.
  • Source: h2c-laravel-orchestrated

Verify the PR exists on GitHub BEFORE sending it to the grader

  • Chose: On idle, extract the claimed PR URL and verify it open via the GitHub App token first; fail fast as pr_fabrication if missing, only then send define_outcome.
  • Rejected: Run the grader first and verify the PR only after a satisfied verdict (or both pre- and post-verify).
  • Why: The grader can't independently check a private-repo URL, so grading first pays grader cost on fabricated URLs and reverses the docs' "rubric must agree with reality" invariant — one extra GitHub call up front closes the fabrication class for free.
  • Source: h2c-laravel-orchestrated

Spike the load-bearing "grader reads files by path" assumption before any production code

  • Chose: Run a ~$0.05 throwaway CMA session first to confirm the grader uses its read tool against rubric-named paths, before implementing the two-file artifact shape.
  • Rejected: Treat the cookbook example as authoritative and pivot inline if wrong, or build a strategy switch covering both shapes up front.
  • Why: The whole architecture rests on that one non-standard assumption (the file is written before define_outcome), so a cheap spike trades 30 minutes for avoiding a multi-day refactor if it's false, while a dual-shape switch over-engineers for a $0.05 question.
  • Source: h2c-laravel-orchestrated