Appearance
AI Features Decisions
Distilled from 14 DECISIONS.md files. Each entry is a real fork in the road. Implementation details that aren't a tradeoff are NOT here.
GitHub auth for agent tools — always use App installation token, never user OAuth
- Chose: Always resolve a GitHub App installation token in
AgentToolFactory::forProject(). - Rejected: Two-tier fallback that prefers the user's OAuth token and falls back to the App token.
- Why: Agent repo access is a project-level operation, not user-level — coupling it to a single user's GitHub connection is fragile and unnecessary. App token has equal or better access for read operations.
- Source: KD-0277
Exclude GitHub tools when no installation exists
- Chose: Drop GitHub tools entirely from the agent toolset when no installation token is resolvable.
- Rejected: Include the tools and let them fail at call time.
- Why: Agents waste tokens probing tools that are guaranteed to 401; surfacing zero tools is cleaner than surfacing eight broken ones.
- Source: KD-0277
Story generation pipeline — remove Quick Mode, route everything through the multi-agent path
- Chose: Single intelligent path (Dedup → Research → Classify → Write).
- Rejected: Keep Quick Mode as a simple-prompt LLM shortcut alongside Agent Mode.
- Why: Even short prompts benefit from duplicate detection, intent classification and codebase awareness; the speed gap (~30s vs ~5s) is acceptable for the quality lift.
- Source: KD-0355
Pipeline shape — 4 narrow agents, not 3 fat ones
- Chose: Dedup, Research, Classify, Write — each tool-scoped to one job.
- Rejected: Collapse Classify and Research into one agent that classifies, picks templates, gathers code context, and decides priority.
- Why: A fat Classify agent recreates the monolithic prompt the rewrite was meant to kill. Each focused agent gets a short prompt and a single responsibility.
- Source: KD-0355
Templates picked by the agent, no type column on templates
- Chose: Pass all project templates to the Classify agent and let it choose.
- Rejected: Add a
typecolumn on the templates table for deterministic matching. - Why: Hardcoded type matching can't handle custom categories like Spike or Refactor; agent-side selection lets projects define arbitrary template sets without schema work.
- Source: KD-0355
Always run Research, even when Dedup found duplicates
- Chose: Research is non-skippable — codebase context flows into Classify on every run.
- Rejected: Skip Research when Dedup returns duplicates to save one LLM call.
- Why: Classify needs technical grounding to distinguish "true duplicate" (intent: update) from "similar but distinct" (intent: create_feature). The 5-15s saved isn't worth losing that signal.
- Source: KD-0355
Add a Validate step (Step 0) before Dedup
- Chose: Lightweight AI pre-check that short-circuits on obviously vague prompts before the expensive agents fire.
- Rejected: Heuristic word-count check, OR rely solely on Classify's
respondintent path. - Why: Heuristics flag valid 3-word prompts ("fix login bug"); leaving it to Classify wastes 10-30s on Dedup+Research API calls for garbage input. A 1-2s validator pays for itself within the first vague request.
- Source: KD-0399
Pipeline orchestration class is an Action, not a Service
- Chose:
AiGenerationPipelineActionlives inActions/Ai/withexecute(). - Rejected:
AiGenerationPipelineService class. - Why: Deptrac layer rules forbid Services from depending on Audit (
AiOutboundLogger) and Actions (ResolveByokProviderAction). Stretching the "Action" semantic is cheaper than weakening layer boundaries app-wide. - Source: KD-0355
Progress events broadcast on the user-private channel, not project-scoped
- Chose: Reuse the
Tenant.{id}.User.{id}private channel. - Rejected: New project-scoped channel so the team sees who's generating.
- Why: AI generation is a personal action — team visibility adds channel-auth and frontend complexity for no clear UX win.
- Source: KD-0399
Stepper enums are int-backed, not string-backed
- Chose: Numeric values for
AgentStepPhaseEnum/AgentStepStatusEnum. - Rejected: Human-readable string values for easier DevTools debugging.
- Why: Frontend already standardises on int-backed enums (priority, type, intent); WebSocket payloads fire frequently and benefit from compactness.
- Source: KD-0399
Bot/AiRun removal — clean sweep, single drop migration
- Chose: One new migration that nullifies FKs, deletes the claude bot user, drops
is_bot,ai_runs,ai_run_logs. Old migration files removed. - Rejected: Only remove items the issue lists; OR keep old migrations for history.
- Why: Partial removal leaves orphaned imports and failing arch tests; keeping old migrations alongside drops creates dead code in the migrations directory.
- Source: KD-0390
AI-key access lookup — dedicated read-side Action, not a service or inline duplication
- Chose: Final readonly
ResolveProjectAiKeyAccessActionreturning aproject_id → boolmap, called once per request from controllers. - Rejected: Plain
ProjectAiKeyAccessResolverservice inapp/Services/, or duplicating theTenantAiKeylookup inline at all 6 controller call sites. - Why: Action-default is already the codebase's strong convention and inherits the existing arch-test discipline; inventing a new "resolver service" layer for one feature, or duplicating 6×, both lose the "one query per request" invariant we want to assert in tests.
- Source: KD-0610
ResourceData drift fix — tripwire from() on the subclass, don't widen the abstract base
- Chose: New
fromWithAccess/fromWithViewContextfactories on the affected resources; basefrom(Model)overridden to throwBadMethodCallException. - Rejected: Widen the abstract
ResourceData::from(Model, ...$context = [])so every subclass takes optional context. - Why: Widening the base would touch ~22 subclasses and drift the abstraction toward a generic projection-DTO base; a fail-loud tripwire on just the affected resources turns "forgot the new factory" into an error at the entry point rather than a silent
has_ai_key=falseregression. - Source: KD-0610
ResourceData arch test — substring-scan banned static query patterns
- Chose:
tests/Arch/ResourcesTest.phpextended with a broad substring scan for::query(,::first(,::all(,::find(in every file underapp/Http/Resources/**. - Rejected: Reflection-walk that only inspects
from()/__construct()bodies on each ResourceData subclass. - Why: Static entry points are exactly what the rule wants to forbid and the patterns won't false-positive on instance calls (
$relation->first()) or constants; reflection lets drift sneak into named static helpers, which is exactly how the lookup leaked into resources in the first place. - Source: KD-0610
Attachment vision cache — separate attachment_extracted_contexts table, not a column on attachments
- Chose: New table keyed by
(attachment_id, model_version), joined only when the harness runs. - Rejected: Nullable
extracted_textTEXT column on the existingattachmentstable. - Why: Every hot-path query for attachments (board, attachment grid, MCP list) would haul TEXT bytes 99% of callers don't need; the composite key also lets model upgrades (Sonnet 4.6 → 4.7) create a new row without losing the prior cache for audit.
- Source: KD-0649
Vision/text extraction placement — new pipeline step between Validate and Duplicate Check
- Chose: Add a sixth pipeline step ("Extract Context") that runs after Validate and before Duplicate Check.
- Rejected: Eager extraction at upload time inside
CreateAttachmentAction, or lazy extraction insideResearchAction. - Why: Upload-time extraction pays vision cost for attachments that may never feed the harness and couples uploads to AI infra; lazy-inside-Research means the extracted text never reaches Duplicate Check, killing the screenshot-dedup benefit. Putting it before Duplicate Check gives both consumers (dedup + research) the text from a single extraction.
- Source: KD-0649
Vision provider — reuse the project's BYOK-resolved provider, no dedicated cheap-vision model
- Chose: Vision call routes through the existing
AiGenerationPipelineActionwithfeature: 'agent_attachment_vision', using whatever provider the project resolved. - Rejected: Always platform Anthropic (ignoring BYOK), or a dedicated cheap vision model (e.g. Haiku) with its own BYOK resolution path.
- Why: Silently using platform credentials when the customer set up BYOK breaks the billing promise; carving out a separate cheap-model path doubles the BYOK resolution surface for a token-cost concern we haven't observed in production yet.
- Source: KD-0649
Attachment selection — snapshot IDs at click-time, not live-read mid-run
- Chose: Request payload pins
attachment_ids[]at click-time; harness is a pure function of(description, project, attachment_ids, user). - Rejected: Re-read the attachment list at extraction time so newly uploaded images "just work" mid-run.
- Why: Live reads need attachment-locking and time-windowing logic, and would surprise users by extracting (and charging vision tokens for) screenshots uploaded after they clicked Generate. Explicit "click again to include" is predictable and race-free.
- Source: KD-0649
Per-attachment vision failure — soft-skip with Warning status, not hard-fail
- Chose: Record per-attachment failure, continue pipeline with the succeeded attachments, mark the Extract-Context step
Warning. - Rejected: Abort the whole harness on any vision failure, or silently skip without surfacing the failure.
- Why: A single flaky vision call shouldn't kill the whole flow when the rest of the input is usable, but silent skips leave users wondering why the AI didn't see a screenshot — Warning status surfaces the dropped image without blocking the story.
- Source: KD-0649
Vision cache invalidation — composite (attachment_id, model_version), no TTL
- Chose: Unique index on
(attachment_id, model_version); never invalidate within a model version. - Rejected: Time-based TTL (e.g. 30 days) on cache rows, or invalidate on Attachment model update.
- Why: Attachments are immutable (replace = new row in
CreateAttachmentAction), so bytes can't change — the only legitimate invalidator is model upgrade, which the composite key handles cleanly. A TTL just burns tokens re-extracting unchanged bytes. - Source: KD-0649
Generalise "Vision step" to "Attachment context step" (images + text) in v1
- Chose: Single MIME-dispatched
ExtractAttachmentContextAction— image MIMEs hit the vision path with cache; text MIMEs (text/*,application/json,application/xml) read bytes directly fromFilesystemwith no AI call and no cache row. - Rejected: Ship vision-only first and defer text attachments to a follow-up ticket.
- Why: Baking "vision" into the Action name, table name, feature key, and step label now means renaming all of it later; the text branch reuses everything we already designed (sanitation, selector UX, output tags) and adds zero token cost since it's just a
Filesystem::read. - Source: KD-0649
CMA client shape — protocol primitives only, drop runSessionToCompletion
- Chose:
ManagedAgentsClientexposes exactly three primitives —createSession,getSession,streamSessionEvents— that callers compose for both synchronous and detach-and-resume flows. - Rejected: Keep the cherry-picked
runSessionToCompletionper-use-case method alongside the new primitives (the original D1 choice before module-shape re-application). - Why: The module-shape lens fired on two of three tests — method name was the use case, and the second consumer (Hand-to-Claude) needed new primitives not new parameters. Shipping the per-use-case method alongside the primitives would lock in the shallow-and-suspect shape KD-0650's first draft was meant to retire.
- Source: KD-0658
Hand-to-Claude agent topology — multiagent v1 (coordinator + planner + implementer + reviewer), not single-agent
- Chose: Coordinator declares
multiagent.typewith planner, implementer, and reviewer sub-agents (4 Agent IDs, depth-1, per-role tool surfaces). - Rejected: Single-agent v1 mirroring KD-0650, with multiagent deferred to a PLAN-027 follow-up.
- Why: CEO accepted the higher provisioning surface (4 Agent IDs, 4 system prompts, 4 Console-Agent records) for the quality ceiling of an "agent reviews its own work mid-session" loop; falls back to single-agent if authoring friction or Anthropic-side multiagent issues block ship.
- Source: KD-0658
Hand-to-Claude auth — inline GH App installation token, no vault in v1
- Chose: Mint a fresh GH App installation token in
HandIssueToClaudeActionand inline it inresources[].authorization_token; agent usesghCLI authed viaGH_TOKENfrom the cloned credentials. - Rejected: GitHub MCP server (
api.githubcopilot.com/mcp/) forcreate_pull_requestwith a per-tenant static-bearer vault holding the installation token (Anthropic's docs-canonical path). - Why: The MCP path forces vault infra into v1 — one vault per tenant, refresh-failed webhook handling, vault model — for the marginal reliability gain over
gh pr create. v1 ships with the smallest auth surface; sessions over 1 h fail with a diagnostic comment until rotation is added later. - Source: KD-0658
Hand-to-Claude sidebar — separate ClaudeAgentSidebar.vue, not bolted onto IssueSidebar.vue
- Chose: New sibling component mounted next to the existing sidebar, gated by
useFeatureActive('handToClaude'). - Rejected: Add the Analyse/Hand buttons + criteria-grading card directly to
IssueSidebar.vue. - Why: Per-criterion grading needs real estate that would crowd the existing sidebar; a separate component lets the whole feature toggle on/off with one v-if and gets redesigned without touching the main sidebar.
- Source: KD-0658
Eligibility analyzer — cache score on issues table, invalidate on body/AC change
- Chose: Compute on user click via
RunIssueEligibilityCheckAction, persistclaude_eligibility_score+claude_eligibility_reasoncolumns on the issue, model observer nulls them when description/prompt is dirty. - Rejected: Re-run the analyzer on every button-click, or pre-compute via observer on every issue create/update.
- Why: No-cache wastes $0.02 per press for an answer that didn't change; observer-on-every-write penalises tenants who mass-edit issues with an AI call per save. Cache-and-invalidate-on-change hits the right cost curve.
- Source: KD-0658
One in-flight ClaudeSession per issue — DB-level guard via generated column + unique index
- Chose: Generated
in_flightTINYINT derived fromstatus IN (Pending, Running)+ unique index on(issue_id, in_flight); concurrent inserts triggerUniqueConstraintViolationExceptionwhich the Action catches and returns 409. - Rejected: Application-level check inside
HandIssueToClaudeAction(SELECT then INSERT), orSELECT … FOR UPDATEpessimistic lock in a transaction. - Why: App-level checks have a tight race window between the SELECT and the INSERT that two simultaneous page loads will hit; pessimistic locks add DB contention. A unique index is self-documenting and atomic.
- Source: KD-0658
GH installation token rotation — defer to follow-up, sessions > 1 h fail with diagnostic comment
- Chose: No rotation in v1; webhook handler detects token-expiry-shaped failures and posts a "session exceeded 1 h" comment with retry guidance.
- Rejected: Periodic-rotate scheduled job that PATCHes a fresh token onto every running ClaudeSession every ~50 min, or capping sessions via outcome rubric so they can never exceed 55 min.
- Why: Rotation has measurable code surface for a failure mode we haven't observed yet; capping via rubric trades a real failure mode for a guess. Defer until production data on actual session durations exists.
- Source: KD-0658
Anthropic webhook intake — mirror the GitHub webhook trio (Controller + Middleware + Job)
- Chose: Dedicated
AnthropicWebhookController+VerifyAnthropicWebhookmiddleware +ProcessAnthropicSessionWebhookJob, each cloned in shape from the GH equivalents. - Rejected: Reuse
GithubWebhookControllerfor the Anthropic events, or handle synchronously without a Job. - Why: Different signature scheme, payload shape, and 5-min freshness rule mean smashing both into one controller hurts both; the freshness rule is too tight to risk synchronous handling on heavy work.
- Source: KD-0658
ClaudeSession history shape — append-only with attempt_number, drop the singleton
- Chose: Drop the unique partial index on
(issue_id, in_flight); addattempt_number int default 1with unique(issue_id, attempt_number); keep every press as its own row. - Rejected: Keep the singleton and wipe the prior row on retry (destructive Redo button), or soft-delete prior rows and insert new.
- Why: Singleton + destructive retry kills the audit trail the feature exists to preserve; soft-delete bolts deleted-at machinery onto an audit-only table when the codebase's canonical pattern for this shape (
ai_outbound_logs,*AuditLog) is append-only. - Source: KD-0685
Persist Anthropic vault_id on the session row
- Chose: New nullable
vault_idcolumn onclaude_sessions, written alongsideanthropic_session_idwhen the press provisions the vault. - Rejected: Enumerate workspace vaults at cleanup time and match by display-name convention (
'kendo hand-to-claude session'). - Why: Cleanup runs hours later from a webhook and a day later from the prune backstop, both with no in-memory link to the minted vault; display-name matching is O(N) per cleanup and not unique-per-session. One extra column write makes cleanup O(1) and avoids brittle name parsing.
- Source: KD-0685
Atomic attempt_number guarantee — unique index, not SELECT … FOR UPDATE
- Chose: Unique index on
(issue_id, attempt_number); losing race throwsUniqueConstraintViolationExceptionwhich the Action catches and returns 409. - Rejected:
SELECT … FOR UPDATEon prior sessions inside a transaction, or app-level pre-check with no DB-enforced invariant. - Why: The catch-and-409 shape is already proven from the old singleton path; the unique index is cheaper than DB locks and self-documents the "gap-free attempts per issue" invariant in schema, which app-level guards can't.
- Source: KD-0685
Vault cleanup — vaults->archive, not vaults->delete
- Chose: Wrap
vaults->archive()in a newManagedAgentsClient::archiveVaultprimitive, sibling to the existingdiscardVault(which archives orphaned vaults). - Rejected: Add a new
vaults->deleteprimitive to honour the issue's literal "delete" wording. - Why: Both calls free the secret and the installation token expires in <1h regardless — only the lifecycle row differs, and the rest of the codebase consistently archives for audit. Diverging from issue wording is cheaper than diverging from the codebase pattern.
- Source: KD-0685
Stuck-session prune — call discardOrphanedSession before updating our DB
- Chose: Prune backstop detects →
discardOrphanedSession($anthropicSessionId)→ set kendo statusTerminated→ run archive cleanup. - Rejected: Only update our DB status and rely on Anthropic's own 1h/24h limits to eventually terminate the runaway agent.
- Why: Without the explicit terminate we keep paying for an agent that's effectively orphaned until Anthropic's natural limits fire; the existing primitive already swallows + logs Anthropic-side errors so the prune never blocks on it.
- Source: KD-0685
In-flight predicate — Pending + Running only, no new RequiresAction enum case
- Chose:
whereIn('status', [Pending, Running])as the in-flight set; the issue'srequires_actionwording is flagged as loose. - Rejected: Add a fifth
RequiresActionenum case before Completed, with matching webhook handling. - Why: The Anthropic SDK doesn't emit a
requires_actionstatus today — adding the enum case ships a kendo status with no producer. Wait until Anthropic adds a webhook event for human-approval transitions before modelling one. - Source: KD-0685
"No events in last hour" measurement — streamSessionEvents + latest occurredAt, not updated_at
- Chose: Per-tick SDK call to
streamSessionEvents($anthropicSessionId), take latestoccurredAt, compare tonow() - 1h. - Rejected: Use
ClaudeSession.updated_at(our DB column) as a proxy for last activity. - Why: Our
updated_atonly moves on webhook events, so an agent working silently between webhook-worthy events would falsely trip the prune. The SDK call is O(in-flight running sessions per tick), bounded and tolerable in v1. - Source: KD-0685
Prune tenant iteration — walk the central claude_session_tenants index, not Tenant::all()
- Chose:
ClaudeSessionTenant::query()->select('tenant_id')->distinct()->cursor()thenswitchToeach in turn. - Rejected: Walk every tenant in
Tenant::all()and switch through to queryclaude_sessionson each. - Why: The full walk visits tenants that have never used Hand-to-Claude (most of them in early rollout); the central index bounds the scan to tenants that have ever had a press, which is a tiny subset and mirrors the existing
ResolveTenantForSessionActionpattern. - Source: KD-0685
Branch naming for retry attempts — kendo generates the slug, not the agent
- Chose:
StartNewClaudeSessionActioncomputes a kebab-slug from the issue title and embeds the full branch name (feat/KD-XXXX-<slug>for attempt 1,feat/KD-XXXX-<slug>-<N>for attempt N≥2) in the kickoff message; the implementer's system prompt is updated to use the provided name verbatim. - Rejected: Let the implementer pick the slug itself (the existing convention) and have kendo append
-<N>after the fact. - Why: There's no "after the fact" — the agent uses the slug directly when opening the PR, so kendo can't append anything. Kendo-side slug computation also makes branch names deterministic, so a human reading the issue can predict the branch.
- Source: KD-0685
Missing-evidence response — format-bounce capped at 1, separate from the 3-cycle review budget
- Chose: A first missing-evidence reply gets bounced once ("include the tool response"); a second consecutive miss fails the session, all separate from the 3-cycle review cap.
- Rejected: Consume one of the 3 review cycles per format miss, or hard-fail immediately on the first miss.
- Why: Counting format mistakes against the review budget lets them starve the substantive review; hard-fail terminates a session that a transient format slip would otherwise have shipped — one bounce mirrors the existing malformed-verdict handling.
- Source: KD-0694
Reviewer quote-rule — allow VERIFIED-BY-ABSENCE for negative ACs, capped at 1 per review
- Chose: Let the reviewer mark a no-regression / "still works" AC with
VERIFIED-BY-ABSENCE: <reason>instead of a quoted diff line, capped at one per review. - Rejected: Force the reviewer to quote a related context line that demonstrates the unchanged property.
- Why: A negative assertion has no
+/-line by definition, so a strict quote rule forces the model to invent a fake quote or downgrade a legitimate APPROVE; the cap stops it from absence-verifying every AC to short-circuit the rule. - Source: KD-0694
Drop the wrong-org URL host-prefix guard — rely on the verbatim tool-response rule instead
- Chose: Remove the coordinator's hardcoded
github.com/script-development/kendoURL check and lean on the requirement that the implementer paste the verbatimcreate_pull_request/update_pull_requestresponse. - Rejected: Keep the hardcoded-org prefix check, or thread the expected
repo_full_nameinto the kickoff message for a dynamic check. - Why: A successful PR tool call is bound to the cloned repo by construction, so the verbatim-response rule already guarantees the right URL; hardcoding one org would footgun the moment Hand-to-Claude rolls out to a second tenant, since the repo is resolved per-project.
- Source: KD-0694
Hand-to-Claude surface — a tab next to Activity, not a widened right rail or a modal
- Chose: Promote the feature to a full-content-width tab alongside Activity, deleting the cramped right-rail sidebar.
- Rejected: Widen the right rail to a fixed
w-96, or open the whole feature in a button-triggered modal. - Why: The rail can't fit per-criterion grading, live progress, and future run/step logs without crowding the description; a modal isn't ambient and has no precedent in the UI, whereas a tab reuses the existing tab-strip pattern and ~triples horizontal space.
- Source: KD-0702
Delete the old surface in the same PR — no coexistence period behind a sub-flag
- Chose: Delete the legacy sidebar in the same PR that adds the new tab.
- Rejected: Keep both mounted behind a sub-flag until adoption is confirmed, then delete the sidebar in a follow-up.
- Why: Coexistence risks "the old surface still works because we forgot to delete it" follow-up debt and doubles the test surface, whereas a single-move migration lets reviewers compare both surfaces in one diff.
- Source: KD-0702
Tab content mounts/unmounts on switch — accept the re-fetch cost, don't KeepAlive or hoist
- Chose: Let the tab destroy and recreate on every switch, re-firing its three GETs and Echo subscribe each activation.
- Rejected: Hoist the composable to the parent page and pass refs down, or wrap the tab in
<KeepAlive>. - Why: Tab switching is rare in practice, so a few small GETs per activation is acceptable; the contract is small enough to refactor to a hoisted instance later if measurements ever show switching is hot.
- Source: KD-0702
Operational telemetry columns stay OUT of the audit hash payload
- Chose: Write the new
outcome_verdictto every row but exclude it from the SHA-256 chain payload. - Rejected: Include the verdict in the hash because it's "audit-meaningful."
- Why: Telemetry that discriminates flavours-of-close (not whether the row was tampered with) must stay out of the payload so rows written before the column existed still verify with the same algorithm — direct precedent set by the cache-token columns.
- Source: KD-0704
No backfill of new audit columns — clean cutover, leave historical rows NULL
- Chose: Drop the backfill entirely; only rows written after the migration carry the verdict, old rows stay NULL.
- Rejected: A backfill migration (or artisan command, or schema FK) joining old log rows to their source session by a time-window match.
- Why: Audit-log tables are append-only by invariant, so reaching into already-written rows is exactly the operation the invariant forbids even via raw SQL; the source could recover only 3 of 9 verdicts anyway, and those 3 aren't the ones operations most needs.
- Source: KD-0704
Board "handed to Claude" signal — denormalised timestamp column, not a per-request withExists bool
- Chose: Store
handed_to_claude_atonissues, set once on first hand-off, backfilled fromMIN(session.created_at). - Rejected: Compute a
has_claude_sessionbool per request via a correlatedwithExistssubquery on every list fetch. - Why: The timestamp is zero per-request cost (column on the parent row), carries when-not-just-if, survives any future session-row cleanup, and is sortable/tooltipable — whereas the subquery pays a unique-index dive per issue on every board load for a single bit.
- Source: KD-0706
"Has been handed" set-once semantic over a live "in-flight" indicator
- Chose: The board annotation appears on first hand-off and stays forever, a pure read of
handed_to_claude_at IS NOT NULL. - Rejected: An in-flight-only indicator that renders while a session is Pending/Running and flips off on terminal status.
- Why: In-flight needs a relation scope, a terminal-broadcast hook, and reactive wiring, yet forgets the most durable fact (that Claude touched the issue) the moment a session ends — and the live status already lives in the tab, so the board needn't duplicate it.
- Source: KD-0706
Wait for the existing webhook branch auto-link, don't eager-create at press
- Chose: Let the existing push-webhook auto-link the branch ~1–2 min after the press; no eager link.
- Rejected: Create an
IssueBranchLinkwithPendingstatus immediately at press, flipped toLinkedby the webhook. - Why: Eager creation buys a 1–2 minute UX win at the cost of orphan-link cleanup on failure, dedup against the auto-link re-fire, and a Pending status badge — three edge cases the existing "handed to Claude" indicator already papers over during the gap.
- Source: KD-0706
Grader feedback rendered only on failure verdicts, omitted on satisfied
- Chose: Include the grader's
explanationonly when the outcome isneeds_revision/failed/max_iterations_reached. - Rejected: Render the explanation on every outcome where it's non-null, including the "why criteria were met" text on satisfied.
- Why: On a satisfied verdict the grader's reasoning is redundant with the verdict and the agent's own summary, so surfacing it only on failures keeps the tight "why did it fail" signal the developer actually needs.
- Source: KD-0707
Seed only feature-state tables, never the hash-chained audit log
- Chose: Seed
claude_sessions+ webhook events + the central index, leavingai_outbound_logsaudit rows out of the seeder. - Rejected: Seed the audit rows too so a future cost-per-session widget has dev data.
- Why: Audit rows are hash-chained (each depends on the prior row's hash) and runtime artifacts everywhere else, so re-implementing the chain in a seeder couples seeding to audit infra for data no UI consumes yet.
- Source: KD-0709
Real demo data via a hardcoded scrubbed snapshot, not a live API-fetch seed command
- Chose: Capture one real session's stats once during implementation into a constant the seeder anchors on, with a comment naming the refresh recipe.
- Rejected: A
dev:seed-from-cma-sessionartisan command that fetches the Anthropic API at seed time. - Why: A live fetch adds an external HTTP dependency (and an API-key requirement) to
migrate:fresh --seedthat risks CI flake and pulls anonymisation scope, whereas a one-time captured snapshot gives realistic data with zero runtime dependency. - Source: KD-0709
AI billing failures get HTTP 402, not the parent's 502
- Chose: Give the insufficient-credit exception a 402 Payment Required status, diverging from the sibling provider exception's 502.
- Rejected: Reuse 502 Bad Gateway for sibling-consistency, or fall back to the 500 default.
- Why: Kendo's convention is a semantically-precise status per exception, and 402 lets an external observer scanning by code distinguish "the tenant's billing is short" from "the provider is broken" — verified not to collide with the existing 402 plan-limit middleware, which checks body shape.
- Source: KD-0784
Don't thread previous: on the new exception — match siblings and dodge the Rector self-revert
- Chose: Throw the exception plain, with no
previous:chaining, matching the four existing match-arm siblings. - Rejected: Thread
previous: $throwableto preserve the stack-trace chain. - Why: The throwing file isn't in the Rector skip list, so a preset would rewrite a
previous:call into a message-passthrough that the throwable-leak arch test bans — and the original is already captured inai_outbound_logs.error_messagebefore the match runs, so the chain is recoverable anyway. - Source: KD-0784
Hand-to-Claude review — platform Outcomes grader, not a second reviewer agent/session
- Chose: Route all review through the single implementer session and the platform grader (
define_outcomeagainst a rubric encoding the reviewer criteria). - Rejected: A separate reviewer agent in its own container/session, kicked off after the implementer idles.
- Why: A second LLM keeps the reviewer-fabricates-APPROVE failure class alive and adds a ~30–60s container boot plus state-machine code per cycle, whereas the grader's isolated context is the structural fix for that class — its private-repo blindness is workable via a self-packaged artifact.
- Source: h2c-laravel-orchestrated
Verify the PR exists on GitHub BEFORE sending it to the grader
- Chose: On idle, extract the claimed PR URL and verify it open via the GitHub App token first; fail fast as
pr_fabricationif missing, only then senddefine_outcome. - Rejected: Run the grader first and verify the PR only after a
satisfiedverdict (or both pre- and post-verify). - Why: The grader can't independently check a private-repo URL, so grading first pays grader cost on fabricated URLs and reverses the docs' "rubric must agree with reality" invariant — one extra GitHub call up front closes the fabrication class for free.
- Source: h2c-laravel-orchestrated
Spike the load-bearing "grader reads files by path" assumption before any production code
- Chose: Run a ~$0.05 throwaway CMA session first to confirm the grader uses its
readtool against rubric-named paths, before implementing the two-file artifact shape. - Rejected: Treat the cookbook example as authoritative and pivot inline if wrong, or build a strategy switch covering both shapes up front.
- Why: The whole architecture rests on that one non-standard assumption (the file is written before
define_outcome), so a cheap spike trades 30 minutes for avoiding a multi-day refactor if it's false, while a dual-shape switch over-engineers for a $0.05 question. - Source: h2c-laravel-orchestrated