Postmortems

Production-affecting bugs and edge cases that taught us something. Newest first.

The fix lives in code; what's preserved here is the root cause and the generalizable lesson — the part that disappears if you only read commits.

KD-1050 — ReportCard clamp gated on a flag only the clamp itself could set

Severity: low
Symptom: The "Show more"/"Show less" truncation on report descriptions never activated, no matter how long the description. The clamp, the fade, and the toggle button silently never appeared — long descriptions rendered at full height with no affordance.
Root cause: The clamp class (max-h-40 overflow-hidden) was applied only once descriptionOverflows was already true, but descriptionOverflows was computed as scrollHeight > clientHeight measured on the unclamped div — and an unconstrained block-level div always has scrollHeight === clientHeight because it grows to fit. Circular and dead on arrival: the flag needed the clamp, the clamp needed the flag.
Fix: Dropped && descriptionOverflows from the :class ternary so the div clamps unconditionally while collapsed; the first measurement then sees a real capped clientHeight. checkOverflow() and the fade/toggle v-ifs were already correct — they just never received a true.
Lesson: Every existing spec set wrapper.vm.descriptionOverflows = true by hand, which pinned the render branches while skipping the only measurement that could ever produce true — a test that forces the state under test proves the branch renders, not that anything can reach it. Any feature whose visibility depends on measuring itself needs a test driving real scrollHeight/clientHeight.

KD-1049 — Reverb launched with no preflight for the PHP `redis` extension

Severity: low
Symptom: /startup reported the dev servers running, but Reverb had crashed instantly with Class "Redis" not found on any machine whose PHP lacked the phpredis extension. Frontend and backend were fine, so realtime features silently didn't work with no message naming the cause.
Root cause: The startup skill's final step launched php artisan reverb:start unconditionally as a background command. An extension check did exist (backend.md Step 1) but ran inside a different, earlier background agent as advisory prose — its result never reached the step that starts Reverb, so it had no gating effect and Laravel's PhpRedisConnector::createClient() threw at boot instead.
Fix: Inline php -m | grep -qi '^redis$' preflight immediately before the Reverb launch. On a miss, Reverb is skipped and the step reports it with per-OS install instructions and the exact re-run command; frontend and backend start unconditionally either way.
Lesson: A check whose result can't reach the step that would act on it is documentation, not a gate. Preflight belongs at the exact point of failure and in the same execution context as the command it guards — and a background command that dies on boot needs its failure surfaced, not just its launch reported.

KD-1048 — `notifications.message` VARCHAR(255) too short for an interpolated issue title

Severity: medium
Symptom: php artisan dev:reset intermittently died mid-seed with SQLSTATE[22001] … Data too long for column 'message', leaving the tenant database partially seeded — NotificationSeeder failed before the Claude eligibility and session seeders ever ran. Intermittent because the seeder picks its issue at random.
Root cause: notifications.message was declared $table->string('message') — implicit VARCHAR(255). NotificationFactory::forIssue() interpolates the full issue->title (itself up to 255 chars) plus template text and an actor name into that column via sprintf, so a realistic worst case comfortably exceeds 255, and MySQL strict mode rejects rather than truncates.
Fix: Append-only migration widening message to VARCHAR(512), mirroring the increase_avatar_url_length precedent. Kept as VARCHAR rather than text — it's a bounded, system-generated display string with a computable worst case, not arbitrary user content.
Lesson: A column that interpolates another VARCHAR(255) column needs a length derived from the worst case, not the framework default — string() with no length is a silent 255 cap. And the arithmetic only bites under MySQL strict mode: the test suite runs on SQLite, which ignores VARCHAR length entirely, so no runtime test can reproduce error 1406. The regression test pins the declared width statically instead, which is the only thing that can hold for MySQL-backed environments.

KD-1047 — Pipeline stepper keeps spinning after story generation fails

Severity: low
Symptom: When AI story generation failed — terminal agent-story-failed event, or the client guard timing out with no terminal event at all — the step that was STARTED at that moment kept its spinner indefinitely. The Generate button correctly reverted to idle, so the UI showed a finished run with a step still in progress, until the next generate() call.
Root cause: useStoryGeneration.ts's onFailed handler and the guard-timeout callback both called surfaceError() + settle(null) but never reset steps. settle() doesn't touch steps by design — the success path already overwrites steps.value with the authoritative mergePipeline(data.pipeline) before settling. The failure paths had no equivalent, so whatever partial state handleProgressEvent last wrote survived forever.
Fix: steps.value = buildInitialSteps() in both failure exits, right after surfaceError() and before settle(null). Success path untouched, so a genuine completion still shows the authoritative pipeline.
Lesson: When the happy path cleans up as a side effect of doing its real work, the error paths inherit no cleanup at all — settle() looked like shared teardown, but only the success branch happened to reset state before calling it. Every terminal exit of a stateful async flow needs its own reset, and the tests have to assert the post-failure state, not just that the error was surfaced.

KD-1045 — `.env.example` promised a `TENANT_ADMIN_DB_*` fallback the config deliberately refuses

Severity: low
Symptom: Provisioning a brand-new tenant on a local setup (a secondary worktree, or any fresh /startup) failed the MySQL auth handshake with an opaque [2054] The server requested authentication method unknown to the client. Invisible to anyone working against the already-provisioned seeded tenant, since only a fresh CREATE DATABASE exercises the path.
Root cause: Documentation drift. config/database.php's tenant_admin connection has no fallback from TENANT_ADMIN_DB_* to DB_*/CENTRAL_DB_* — deliberately, per KD-0793, because such a fallback caused a real production bug where signup's CREATE DATABASE ran under the steady-state credential. But .env.example's comment asserted the opposite ("defaults fall through to the same MySQL root credential"), and the /startup env reference never set the vars, so TENANT_ADMIN_DB_PASSWORD resolved to '' against Docker's root.
Fix: Rewrote the .env.example comment to describe the real no-fallback design, and added explicit TENANT_ADMIN_DB_USERNAME=root / TENANT_ADMIN_DB_PASSWORD=root steps to /startup's env reference alongside the existing DB_* local defaults. config/database.php untouched, per scope.
Lesson: When a config deliberately omits a convenience fallback for a safety reason, the comment describing it is load-bearing and drifts silently — nothing tests prose, and the next reader trusts it. The gap only bites the first-run path (provisioning a new tenant) that the everyday seeded setup never touches, so the local-setup docs have to set explicitly whatever the code refuses to infer.

KD-1028 — Tiptap serializes nested lists at 2 spaces; `marked` needs 3+

Severity: medium
Symptom: Editing an epic description containing a nested numbered list and saving flattened it — sub-items lost their indentation and the whole list renumbered as one continuous 1..N sequence on render. The same editor and renderer power issue descriptions and comments, so all three surfaces were affected identically.
Root cause: Two independently-configured markdown implementations disagreeing. RichTextArea.vue registered the Tiptap Markdown extension with no indentation option, defaulting to 2 spaces per level on serialize. The render side (markdown.ts) uses marked (CommonMark), which requires a sublist under an N. marker to be indented to the parent marker's content column — ≥3 spaces for 1. , ≥4 for two-digit markers. At 2 spaces marked doesn't recognise a sublist at all and flattens it into a single <ol>.
Fix: Markdown.configure({indentation: {style: 'space', size: 4}}) — one line, covering the epic, issue, and comment editors since they share the component.
Lesson: Tiptap's own tokenizer round-trips its 2-space output fine, so every editor-side test passed; the defect exists only in the seam between the serializer and a different renderer. Round-trip coverage has to run the real save→render pipeline, not the editor's own parse. The deeper smell is structural: two markdown implementations with no shared dialect contract will diverge again on nested blockquotes, code blocks inside list items, and anything else where CommonMark has an indent rule.

KD-1025 — Promote Actions check "already done?" outside the transaction (TOCTOU)

Severity: high
Symptom: Concurrent promotes — a double-click, a retry, two users — both passed the guard and both did the work. PromoteErrorGroupToIssueAction created two Bug issues, one orphaned, with last-write-wins on error_groups.linked_issue_id; PromoteReportsAction created duplicate issues, one carrying copied attachments that never got linked.
Root cause: Check-then-act with no atomicity between check and persist. The guard read an in-memory property outside any transaction, then the slow side-effecting work (CreateIssueAction, CopyAttachmentsAction) ran, and only afterwards was the flag the guard checked written. Nothing at the DB level made the claim atomic, so both racers observed "not done" and both acted. Windows were :35→:54 and :41→:79.
Fix: Lock-and-hold, reusing LinkProjectGithubRepoAction's existing pattern — open the transaction first, lockForUpdate() the guarded rows as the first statement, re-assert the guard column under the lock, and relocate the side-effecting work inside the lock window. The reports re-fetch adds orderBy('id') to fix lock order so overlapping sets can't deadlock. Losers throw the pre-existing 422 exceptions, so the HTTP response is unchanged.
Lesson: A guard is only a guard if the read and the write that satisfies it are atomic — an in-memory property check before slow work is a suggestion. Note what the shipped tests still don't prove: they mock the locked re-fetch to return the winner's committed value, which pins the code path but not real lock contention, so this class stays unproven under load until a MySQL-backed race harness exists. Site #3 (StartNewClaudeSessionAction) was split out because claiming before the paid Anthropic call needs a schema migration — the unique index prevents a duplicate row but not a wasted paid session.

KD-1015 — Full-payload preference PUT lets two sessions clobber each other

Severity: medium
Symptom: Two independent sessions (two tabs, two devices) each changing a different notification preference around the same time: whichever request committed last silently reverted the other session's already-committed change.
Root cause: The wire contract had no way to express "this field is unchanged." All 10 fields were required in the FormRequest and non-nullable bool on the DTO, and the Action unconditionally assigned all 10 from $data, so every request was a full-state overwrite built from the sending session's snapshot. $user->refresh() made each individual write retry-safe and read current DB state, but didn't help — the Action then blindly re-applied all 10 values from the request anyway.
Fix: Partial-update contract — fields become ?bool, rules go required → sometimes (extracted via the canonical extractNullableBoolean() helper), and each assignment is guarded by !== null. The frontend toggle() sends {[key]: value} instead of spreading the full snapshot.
Lesson: KD-1001 fixed the same clobber within one session by serializing writes client-side, and that fix could never reach across sessions — because the real defect was the full-state wire contract, not the client's queueing. When a payload asserts fields the user didn't touch, last-write-wins is guaranteed the moment a second writer exists; silence about a field is the only safe way to say "leave it alone." Incidentally exposed that notify_error_activity_email had been required on the backend since KD-0894 while the frontend never sent it — every preferences save on development was 422-ing.

KD-1005 — EpicBoard scroll-anchor test rides the real clock, red for one day a year

Severity: low
Symptom: EpicBoard.spec.ts → "should re-anchor today at 25% from left edge on granularity change" went red on 2026-07-02 with AssertionError: expected 900 not to be 900, blocking an otherwise-green unrelated PR, then self-healed overnight.
Root cause: The component anchors "today" via new Date() and the test never froze the clock. Both grids for the clamped full-2026 range are odd-length and symmetric about the range midpoint (DAY: 365 cols, centre 183; WEEK: 53 cols, centre 27), so on the midpoint day both granularities resolve todayFraction to 0.5 and produce the identical scrollLeft of 900 — making the test's .not.toBe(scrollAfterMount) assertion false. Production code was correct.
Fix: vi.useFakeTimers({toFake: ['Date']}) + vi.setSystemTime(new Date('2026-03-15')) in the two clock-reading scroll tests, restored with vi.useRealTimers(). 2026-03-15 sits well off the midpoint so the WEEK-mount and DAY anchors genuinely differ and the assertion still tests real re-computation. Faking only Date leaves the synchronous requestAnimationFrame stub intact.
Lesson: Any test reaching production code that calls new Date() is a calendar-dependent flake waiting for its date — and the window can be a single day per year, so "it's been green for months" is no evidence at all. Freeze the clock at a date chosen to make the assertion discriminating, not merely deterministic: pinning the midpoint date would have been just as broken, only permanently.

KD-1002 — Markdown links open in the same tab, unloading the SPA

Severity: medium
Symptom: Clicking a link inside a rendered issue description or comment navigated the current tab to the target instead of opening a new one. Returning via the browser back button showed the user as logged out.
Root cause: createMarkdownRenderer registers a custom mention extension but never overrode the default link renderer, and marked v18 emits bare <a href="…"> with no target or rel. The apparent logout is a downstream consequence: same-tab navigation unloads the SPA, and on back-navigation an API call fires before the session re-validates, 401s, and handleSessionExpired redirects to login.
Fix: A renderer.link override appending target="_blank" rel="noopener noreferrer" to every anchor (link text rendered through this.parser.parseInline(token.tokens) so nested inline formatting survives), plus {ADD_ATTR: ['target']} on DOMPurify.sanitize() — DOMPurify strips target by default as an open-redirect precaution.
Lesson: The sanitizer happily preserves an attribute the renderer never emitted, so "DOMPurify keeps target" was true and useless — in a two-stage HTML pipeline the assertion has to be written against the final output, not either stage. And a symptom that reads like an auth bug ("the back button logs me out") can be the second-order effect of a rendering default; investigating the session layer would have found nothing wrong there.

KD-1001 — Rapid notification toggles silently revert each other

Severity: medium
Symptom: On Account Settings, flipping two notification toggles in quick succession made the first one visibly bounce back to its old value, and persisted the wrong preference.
Root cause: toggle() spread preferences.value — a computed derived from live auth state — into a full-payload PUT. Auth state only updates when a request responds, so a second toggle fired before the first response landed built its payload from the first key's stale value. The two PUTs raced and each response overwrote authService.user wholesale, last-write-wins. ToggleButton uses defineModel and flips optimistically, which is why the revert was visible as a bounce when the stale response landed.
Fix: Optimistic local state plus serialized writes — bind the toggles to a local reactive mirror seeded once from the logged-in user (instant UI, cumulative payloads), and chain each PUT through a component-scoped promise queue so the backend's last write is always the final cumulative state.
Lesson: Deriving a request payload from state that only advances on response makes every in-flight request a stale-read window. The regression tests are the model for this class: hold the first request open with a never-resolving promise and fire the second — a test that awaits each call in turn cannot express the race at all. And this only closed the same-session case; the cross-session version was still wide open and became KD-1015.

Severity: low
Symptom: Visiting the app while logged out — a bookmarked board URL, or the login page directly — showed a toast with the raw backend string "Unauthenticated." on top of the login screen. The redirect itself was expected; the toast was noise on the first screen a logged-out user sees.
Root cause: toastMiddleware's 401 carve-out keyed off window.location.pathname === '/', but the login route is mounted at /auth/login, so the carve-out never fired for the guest flow it was written to cover. The middleware never consulted authService.isUserLoggedIn — unlike registerUnauthorizedMiddleware, which already used exactly that signal to distinguish an anonymous 401 from a real session expiry.
Fix: Both the tenant and central toastMiddleware now suppress on !authService.isUserLoggedIn.value instead of the pathname. A mid-session 401 still toasts, since isUserLoggedIn only flips after the session-expiry middleware clears user, which runs after the toast middleware for the same response.
Lesson: A URL string used as a proxy for auth state is a guess that stops matching the moment a route moves, and does so silently — the guard kept "working," it just never matched anything. The correct signal already existed one file over, used by a sibling guard to answer the same question. When two middlewares both decide "is this 401 expected?", they must read one source, not two proxies for it.

KD-0998 — Issue-create form loses typed input on an empty-title submit

Severity: medium
Symptom: Submitting the issue-create page without a title replaced the form with a fatal "Could not load page" screen; clicking "Go back" landed the user on the board with everything they had typed gone.
Root cause: submitIssue awaited newIssue.create() and let the rejection propagate. The 422 rejection escaped the handler and reached the async error boundary, which unmounted the form — and the user's input with it. Feedback was never the problem: errorServiceMiddleware and toastMiddleware fire synchronously in the HTTP response pipeline and had already populated the inline errors before the promise settled. The rethrow was what destroyed the form.
Fix: newIssue.create().catch(() => null) with an early if (!created) return guard, so the handler stops rethrowing and the form stays mounted with inline errors visible.
Lesson: An unhandled rejection from a routine validation failure hands a recoverable form to the fatal error boundary, which cannot distinguish "this page failed to load" from "the user forgot a field." A boundary that renders instead of the form has to be reserved for failures that genuinely make the page unusable — and the user-visible cost of getting that wrong isn't the error screen, it's the lost input.

KD-0997 — Token CTA gated on the wrong permission, three times running

Severity: medium
Symptom: On the project error-tracking empty state, "Create a project token" was shown to every project member. For a member without settings access, clicking it did nothing — the router silently redirected with no toast and no visible state change, so the button read as broken.
Root cause: The CTA rendered unconditionally and leaned on the destination route's PROJECTS.UPDATE gate, which authMiddleware enforces by silent redirect. The first fix gated it on PROJECT_TOKENS.CREATE — the ability the backend policy actually requires to mint a token — but those two permissions are independent, so a principal holding one without the other still hit the same silent no-op. The second fix ANDed in canInProject(PROJECTS, UPDATE, projectOwnerId), which carries an owner-bypass the router's plain can(PROJECTS, UPDATE) does not, so a Member-role project owner still saw the CTA and was still bounced.
Fix: canReachTokenSettings now calls the identical can(PROJECTS, UPDATE) the router calls, ANDed with the token-creation ability. Chosen over widening the route's meta.permissions, which would have admitted token creators to the entire settings page (teams, lanes, labels, AI keys) since no child section gates itself.
Lesson: A visibility gate that predicts a route guard must call the same function with the same arguments — two post-merge review rounds were spent on near-miss variants of the right permission. The spec was complicit: canInProject was mocked with a single blanket mockReturnValue, which made "has one permission but not the other" literally inexpressible, and that is exactly the gap that let both broken versions ship. Mock each permission check independently or the tests cannot see this class of bug.

KD-0994 — `splice(index)` with no `deleteCount` wipes the whole modal stack

Severity: high
Symptom: Two-factor authentication could not be enabled and recovery codes could not be regenerated at all — after confirming the password prompt, the TwoFactorSetup / RecoveryCodes modal never appeared.
Root cause: The modal service's onClose handler called modals.value.splice(index) with no deleteCount. Array.prototype.splice(start) removes every element from start to the end of the array. PasswordConfirmModal sat at index 0 and pushed the child modal at index 1; when the parent closed, splice(0) wiped both entries, destroying the child immediately after it was added.
Fix: modals.value.splice(index, 1) — a single-character change.
Lesson: The bug shipped with the modal service in March 2025 and was harmless for eleven months because every caller opened exactly one modal. It went live in February 2026 when the password-confirmation requirement introduced the codebase's first nested createModal, then sat unreported for four more months. A latent API misuse becomes a defect when a new usage pattern arrives, not when the code changes — so a single test for "closing one modal in a stack leaves the others mounted" would have caught it at any point across those fifteen months, and no amount of reviewing the February diff would have.

KD-0993 — Promote broadcast excludes the promoting user's own connection

Severity: medium
Symptom: After promoting a report to an issue, the promoter's own browser kept showing it as Pending until a manual refresh. Every other user watching the project saw it leave Pending immediately.
Root cause: ReportBroadcaster::updated() always called ->toOthers(), which uses the requester's socket ID to exclude their own Echo/Reverb connection. PromoteReportsAction relied solely on that broadcast to propagate promoted state into the frontend store — a deliberate KD-0526 decision, pinned by a frontend spec asserting promote() does not refetch — and never applied the POST response locally. DismissReportAction shares the same toOthers()-scoped broadcast and has no bug only because the frontend dismiss() applies the response via setById as a backstop.
Fix: updated() gained an opt-out bool $toOthers = true; PromoteReportsAction passes toOthers: false so the initiator's own connection receives the event. No frontend change — the store's existing onUpdate handler applies it.
Lesson: "The broadcast handles state sync" and "the broadcast excludes the initiator" are each reasonable and jointly a bug — and both were pinned by passing tests, one on each side of the stack, so the contradiction was invisible from either side. When a mutation's only state-sync path is a broadcast, the initiator needs either a local apply or an inclusive broadcast. The sibling action that happened to get this right (dismiss) is precisely what kept the asymmetry from being noticed.

KD-0991 — `items-start` shrink-to-fits the Settings card below `lg`, clipping controls off-screen

Severity: medium
Symptom: Below 1024px the Settings/Profile page overflowed horizontally with the right-hand portion clipped and unreachable — the copy button on a revealed API token, the Cancel button on the create-token form, and "Disable 2FA" were all sliced off with no way to scroll to them.
Root cause: The page wrapper carried unconditional items-start alongside flex-col-reverse lg:flex-row. Below lg the container is in column mode, where the cross axis is horizontal — so items-start stopped main-card from stretching to full width and made it shrink-to-fit instead. min-w-0 flex-1 on the card governs only the main axis (height, in column mode) and did nothing. The token table's intrinsic minimum width then propagated up the auto-width chain and pushed the card past the viewport, where the app shell's overflow-x-hidden clipped it silently rather than offering scroll. At ≥1024px the cross axis becomes vertical and items-start merely top-aligns, so desktop was unaffected.
Fix: Moved items-start behind the breakpoint (lg:items-start), so the stacked layout falls back to the flex default stretch and the table's own overflow-x-auto contains the overflow.
Lesson: align-items swaps which axis it governs when flex-direction changes, so an unconditional alignment class on a responsive container means two different things above and below the breakpoint — and the sizing helpers that look like they should save you (min-w-0, flex-1) are main-axis only. JSDOM computes no layout, so the 204-test auth suite could neither have caught this nor can it catch a re-break; a browser click-through is the only evidence that counts. The same pattern still sits unfixed in Workspace.vue, teams/Show.vue, and projects/Edit.vue.

KD-0985 — Pagination bar prints the computed-ref variable name as the entity label

Severity: low
Symptom: The comments pagination bar on an issue read "Showing 1 - 10 of 12 paginatedComments" — the camelCase identifier leaked verbatim into user-facing copy.
Root cause: CommentSection.vue passed entity-name="paginatedComments" (the computed ref's variable name) and PaginationBar renders directly. The string was never updated when the variable was named.
Fix: entity-name="comments", with the spec assertion updated to match.
Lesson: The spec asserted 'paginatedComments' and therefore locked the bug in — a test that pins whatever the code currently does converts a typo into a specification, and every subsequent run confirms it. PaginationBar also has no constraint that entityName be human-readable, so any string passes through; a prop whose value lands in a rendered sentence needs either a typed set of allowed values or a review habit of reading the sentence out loud.

KD-0977 — Arch test re-`resolve()`s an already-normalized path, false-positiving on Windows

Severity: medium
Symptom: The full frontend suite failed locally for Windows developers while staying green on Linux CI, with the import-boundaries arch test flagging a legitimate intra-projects import. It was filed as a seed-dependent inter-file mock-pollution flake with a rotating victim spec.
Root cause: getDomainFromFilePath computed filePath.replace(\${resolve(TENANT_DOMAINS_DIR)}/`, ''). TENANT_DOMAINS_DIRwas alreadynormalizePath(resolve(…))— forward slashes — and re-wrapping it inresolve()reintroduced platform separators, so on Windows the replace matched nothing, the full path survived, andsplit('/')[0]returned"C:". That garbage domain never equalled the imported domain, which disabled the same-top-level-domain skip and flagged the import. On Linux resolve()` returns forward slashes, so CI stayed green.
Fix: Strip the already-normalized constant directly (drop the redundant resolve()), sweeping sibling arch specs for the same pattern. Separately pinned isolate: true explicitly in the root Vitest config to make the protective invariant regression-proof.
Lesson: The filed diagnosis was wrong and had to be disproven by experiment: two crafted specs polluting and then reading a shared __mocks__ singleton showed no leak under the default isolate: true, so the mock-pollution class is architecturally prevented — and the blanket mockReset the report implied would have been a no-op for the symptom and would have wiped the module-load defaults many specs depend on. "A different victim each run" reads like a flake and was in fact one deterministic, platform-specific failure. Separately: normalize a path once and pass the constant around; re-normalizing is exactly where the separators come back.

KD-0953 — Avatar URL never changes, so the browser never refetches after upload

Severity: medium
Symptom: After uploading a new profile picture, the old avatar persisted on every surface — issue sidebar, activity log, member lists — until a hard browser reload.
Root cause: Two compounding layers. ProfilePictureController::show() served avatars with Cache-Control: max-age=604800, public, so browsers held them for a week without revalidating. And ProfilePictureUrlsData::forUser() built URLs keyed only on user id, so the fresh ProfileResource returned after upload contained byte-identical URL strings — Vue saw an unchanged src and never triggered a fetch. Even under no-cache the browser never asks, because the <img> src never changes.
Fix: Append ?v=<user->profile_picture> (the baseName that rotates on every upload) to both URLs, and switch the route middleware to cache.headers:no_cache;public so the browser revalidates; the pre-existing weak ETag makes that a cheap 304 when unchanged. No frontend change — the version token fixes every in-session client, not just the uploader.
Lesson: Cache headers cannot help when the client never asks — for mutable content behind a stable URL, the src binding is the cache key, and content-addressable or version-tokened URLs are the only reliable invalidation under a reactive framework. Note the version token had to reach three separate URL-building sites, two of them inlined inside GetIssueActivitiesAction rather than going through the resource — duplicated URL construction is what makes this kind of fix leaky.

KD-0951 — Attachment grid scrolls horizontally behind an auto-hiding scrollbar

Severity: low
Symptom: On the issue page the attachment grid rendered as a single horizontal strip; files past the container width sat behind a macOS auto-hiding scrollbar, so users had no signal that more attachments existed. Separately, the "Attachments" section header showed no count, unlike the Comments and Activity tabs.
Root cause: AttachmentGrid.vue carried flex overflow-x-auto as unconditional static UnoCSS attributes, so the compact context's horizontal strip was also applied in the non-compact issue-page context where wrapping was wanted. The count was already computed in AttachmentSection.vue but never rendered into the SectionHeader slot.
Fix: Moved overflow-x-auto out of the static attributes into the dynamic :class behind compact, and gave the non-compact path flex-wrap; added a count badge span reusing the tab strip's styling.
Lesson: A component with a compact prop that applies its compact-only layout statically has one variant that is silently wrong at every call site — the prop implies a branch the class list never actually makes. And overflow hidden behind an auto-hiding scrollbar is invisible by construction on macOS, so this class of defect can't produce a report that describes what's missing; the user doesn't know anything is.

KD-0931 — AI-key-missing failure collapses into a generic "try again"

Severity: medium
Symptom: A tenant with no Anthropic API key configured triggered story generation and saw only "Story generation failed. Please try again." Neither the cause (no key configured) nor the remedy (where to add one) ever reached the user.
Root cause: Two independent gaps composing. GenerateStoryJob::failed() logged the exception message but passed only $requestId and $userId to BroadcastStoryGenerationFailureAction, which always dispatched a hardcoded generic string — the exception type was discarded before the broadcast. And on the frontend, onFailed received event.message from the broadcast and ignored it, calling surfaceError() with no argument and always rendering the same hardcoded string.
Fix: Threaded the message through the stack — optional $message on the broadcast action defaulting to the generic string, a ternary in failed() selecting "No AI key configured for this project. Add one in Project Settings → AI Keys." for AiKeyNotConfiguredException, and surfaceError(event.message || undefined) on the client so an empty string still falls back.
Lesson: Two layers each defaulting to a generic message means the specific one can never arrive, and neither layer looks broken in isolation — the backend "handles" the failure, the frontend "shows" it. Any error channel carrying a message field needs at least one test asserting that a specific message survives the whole trip; a test that the generic fallback renders proves only that the fallback works.

KD-0928 — Lane reorder invisible until refresh: the cache-hash protocol never reached the client

Severity: medium
Symptom: Reordering lanes in Project Settings didn't change the board until a full page refresh.
Root cause: Two backend deficiencies in the cached-store invalidation protocol (ADR-0032), each already diagnosed separately. config/cors.php had exposed_headers => [], so the browser stripped x-fs-cache-hashes on every cross-origin request — which is every local dev install (SPA on :3000, API on :8000) — and the store wrapper saw null on every response (KD-0920). And StampCacheHashesMiddleware was mounted on only five routes, none of them the navigation requests an SPA actually fires, making steady-state invalidation circular (KD-0919).
Fix: This ticket shipped the missing HTTP-level regression test — PUT /api/projects/{id} with a lane reorder must stamp the fresh lanes_hash in x-fs-cache-hashes, and that hash must differ from the pre-mutation value. The two defects themselves were fixed by KD-0920 and KD-0919.
Lesson: The existing protocol test covered the Action bumping lanes_hash in the DB and the resource endpoint stamping the header — but never the mutation endpoint the settings page actually calls, which is the single link the whole invalidation chain hangs from. Testing each end of a protocol independently leaves the seam untested, and both defects lived precisely in that seam. A protocol needs at least one test that walks it end to end at the HTTP layer.

KD-0911 — Interrupted navigation renders a stale `ProjectLayout`, throwing on a missing route param

Severity: medium
Symptom: Navigating from a list view (Inbox, My Issues, Error tracking, All projects, Roles) into a detail page and away again before it finished loading produced an unhandled Missing required param "parentId" — a brief flash of the error screen, or a full "Could not load page" on the wrong route.
Root cause: ProjectLayout.vue is an async component and Vue's Suspense does not cancel a pending setup. When the setup resolved after the user had navigated away, Vue rendered the stale instance against the new reactive route. MenuTabs' tab to objects carry only {name}, so the custom RouterLink fell through to resolveParentId, which reads currentRoute.params.parentId — undefined on a non-project route — and router.resolve threw. MenuTabs sits above the AsyncErrorBoundary in the template and nothing wraps ProjectLayout in the routing tree, so it surfaced as an uncaught promise rejection.
Fix: Optional parentId prop on MenuTabs; ProjectLayout passes projectId captured at mount time, before any await, so it is immune to reactive route changes. Tab links never consult currentRoute again.
Lesson: A route-derived implicit fallback — resolveParentId silently reading currentRoute when no param is passed — is a hidden dependency on "the route hasn't changed since this component mounted," an assumption async setup breaks by construction. Any to object omitting a required param for a project-scoped route will throw the moment it renders on a non-project route, so the whole call-site class is suspect, not just this one. Also worth noting the error boundary was positioned below the component that throws, so it caught nothing.

KD-0886 — Branch unlink hidden behind a GitHub OAuth check it doesn't need

Severity: low
Symptom: On the issue Git panel, the per-branch "Remove" button was hidden for any user without personal GitHub OAuth connected — so a user whose branch was auto-linked by pushing a branch name containing the issue key could not unlink it from the UI without connecting GitHub first.
Root cause: BranchLinkList.vue gated the Remove button with v-if="isConnected", reflecting personal OAuth status from GET /github/status. But unlink calls a kendo-local endpoint that never touches GitHub, and the backend gates it only on IssueBranchLinkPolicy::delete() (project membership). The frontend gate was strictly broader than what the backend enforced — create-branch and link-branch genuinely need OAuth, remove does not.
Fix: Dropped the v-if, the now-unused isConnected prop, and its call-site binding. The create and link controls stay gated.
Lesson: One connection flag gating a whole panel is the easy default and it over-gates every operation in the panel that doesn't need the connection — the right granularity is per-operation, matched to what the backend actually enforces. The existing spec asserted the button was absent when disconnected, so the over-gate was pinned as intended behaviour until someone flipped the polarity.

KD-0859 — Resend-verification keyed on `tenant.database` as a proxy for the domain

Severity: low
Symptom: None in production — the gate is structurally wrong but currently unreachable, since operator-created tenants receive a WelcomeNewTenant email rather than entering the verification flow. Were it reachable, the action would silently no-send: the endpoint returns a uniform 204 either way for anti-enumeration, with no log line.
Root cause: Two independent sites used tenant->database as a proxy for the user-facing subdomain. SignupAction sets database and domain from the same $data->subdomain, so the invariant holds for signup tenants; CreateTenantAction sets them from separate DTO fields an operator can diverge. Once they diverge, the tenant lookup misses and the domain-readiness gate resolves to null.
Fix: Step 1 matches the inbound subdomain against domain.domain via whereHas('domains', …); Step 2 re-keys the readiness exists() off the same $data->subdomain. The domain filter is retained deliberately — dropping it would gate on any Active domain the tenant owns, which opens the gate prematurely for a multi-domain tenant whose target is Pending and whose sibling is Active.
Lesson: An invariant enforced by one creation path and not another is a proxy waiting to break, and an anti-enumeration uniform response guarantees the break is silent — no user signal, no operator signal, so a log line on the held-send path is the only observability that exists. Worth recording that the first shape of this fix over-corrected by dropping the domain filter entirely; the multi-domain regression test that caught it only appeared in a second review round.

KD-0835 — Cached SPA shell references chunk hashes the deploy already replaced

Severity: high
Symptom: On the first visit each morning after an overnight deploy, users got a 404 or a blank page. A manual refresh resolved it every time.
Root cause: The Laravel SPA fallback route returned a bare view('app') with no explicit Cache-Control. Symfony's default no-cache, private permits storage with conditional revalidation, and Cloudflare in front of Fly.io may treat the response as cacheable too. A cached shell references tenant-<oldhash>.js, which the deploy replaced and nginx correctly 404s. Because the entry module 404s before JavaScript starts, the existing vite:preloadError / router.onError recovery never runs.
Fix: Cache-Control: no-store on the SPA HTML response. Hashed assets in /assets/ keep expires 7d — they're content-addressable and safe to cache indefinitely.
Lesson: A recovery handler that lives inside the bundle cannot recover a failure to load the bundle — the stale-build machinery had been in place for months and was structurally incapable of covering the entry-chunk case, which is the one users actually hit. The shell that names hashed assets must never be cached; only the content-addressable assets may be. Tradeoff worth remembering: no-store also evicts the page from the bfcache in Chrome and Firefox, so every back-navigation becomes a full document reload; no-cache closes the same defect more cheaply if that turns out to be perceptible.

KD-0794 — `composer test` OOMs at 512M because the whole suite runs in one process

Severity: low
Symptom: composer test fatal-OOMed mid-Feature with Allowed memory size of 536870912 bytes exhausted, crashing inside a vendor MIME-type detector that was merely where cumulative allocation tipped over the cap. Some developers reproduced it, others couldn't.
Root cause: The test script runs ~686 files (Unit + Feature + Performance + Concurrency + PrivilegeSeparation) serially in one PHP process with no --parallel, unlike test:unit, so Zend memory accumulates across the whole run. Measured peak is ~715 MB — already ~190 MB over the cap. The intuitive suspect was measured and ruled out: the peak is mode-independent within ~16 MB across xdebug.mode off/coverage/develop, because xdebug's coverage data lives in its own malloc outside the Zend allocator and doesn't count against memory_limit at all.
Fix: Raised the test script's memory_limit to 1024M (~310 MB headroom over the measured peak), with the rationale recorded in composer.json's scripts-descriptions so it's visible from composer run-script --list. test:arch left at 512M.
Lesson: memory_get_peak_usage(true) measures what memory_limit actually gates — OS RSS and extension-owned allocations don't — which is why the xdebug hypothesis had to be disproven by measuring at memory_limit=-1 rather than argued about. The reason nobody caught this earlier is structural: CI never invokes composer test (it runs per-suite jobs), so this is a DX-only failure with no CI signal, and the same blind spot is currently hiding a Performance-suite query-count regression (18/16 queries against guards of 14/12). The real fix — parallelising Unit+Feature, since Concurrency and PrivilegeSeparation aren't parallel-safe — is still parked.

KD-0924 — Subdomain availability check ignores the `domains` table

Severity: medium
Symptom: CheckSubdomainAvailabilityAction reported a subdomain as available when it existed in domains.domain but not in tenants.database. The subsequent signup insert then 500'd on the domains_domain_unique constraint. Reproduced in prod.
Root cause: The action only queried tenants.database, but Domain is the canonical owner of the subdomain (the unique index lives on domains.domain). When tenants.database diverged from domains.domain — possible after a rolled-back/diverged provisioning run or operator edit — the check passed falsely while the insert collided.
Fix: Injected Domain alongside Tenant; execute() short-circuits to available: false on a tenants.database hit, then also checks domains.domain.
Lesson: An availability check must consult every table that owns the uniqueness invariant it's predicting — checking one of two tables that can diverge guarantees a false "available" the moment they drift. The DB unique index is the real contract; the pre-check has to query the same column(s) the index covers.

KD-0920 — `x-fs-cache-hashes` header not CORS-exposed, killing cross-origin cache invalidation

Severity: medium
Symptom: On any cross-origin setup (every local dev install: script.localhost:3000 → :8000), the browser never handed the x-fs-cache-hashes response header to JS, so the cached-store wrapper saw no invalidation signal and lanes/labels/sprints stayed stale until a full refresh. Invisible failure — the wrapper degrades to null silently.
Root cause: config/cors.php set 'exposed_headers' => []. Browsers expose only the seven CORS-safelisted response headers to cross-origin JS; a custom header is readable only if the server lists it in Access-Control-Expose-Headers, which Laravel's HandleCors emits only when exposed_headers is non-empty. Backend stamping and the SPA wrapper were each correct in isolation — the bug was the missing exposure entry between them.
Fix: Added 'x-fs-cache-hashes' to exposed_headers (a config constant, not an env knob — the header is a fixed non-sensitive protocol value).
Lesson: Stamping a custom response header does nothing for cross-origin JS unless the server also CORS-exposes it — the two are separate steps and a header present on the wire is still invisible to headers.get() without the expose list. Same-origin prod hid it; the first witness was every dev install. When a protocol depends on a custom header, the CORS expose entry is load-bearing, not optional.

KD-0919 — Cache-hash header stamped on too few routes to ever reach the SPA

Severity: medium
Symptom: The cached-store protocol's steady-state invalidation never fired during normal navigation. Client A mutated a sprint/epic/lane/label, the backend bumped the project's *_hash, but an open tab on client B kept serving the stale list — the refetch signal never arrived.
Root cause: Registration coverage, not logic. StampCacheHashesMiddleware was mounted on only five narrow routes (project show + the four cached-resource groups). The requests an SPA actually fires while navigating (board, backlog, issue show, comments) were in none of those groups, so the response carried no header. The signal was circular: the only responses announcing "sprints changed" were the sprint requests the wrapper had already decided to suppress.
Fix: Hoisted the middleware to the Route::prefix('projects') group (one registration covers every project-scoped route, current and future) and removed the five redundant inline mounts. The middleware already self-guards index/store to header-free.
Lesson: A change-notification header is only useful on the responses the client actually requests in steady state — stamping it solely on the resource's own endpoints is circular, because those are exactly the requests the cache suppresses. Mount the signal across the whole navigation surface (group-level), and lean on the middleware's self-guards rather than narrow per-route registration that drifts as routes are added.

KD-0918 — Memoized cached stores go deaf to broadcasts after the first page unmount

Severity: high
Symptom: Sprints/epics/lanes/labels created or changed by another client stopped appearing live after the user's first in-app navigation — they surfaced only after a full refresh. Per-page live data (board, comments, time entries) kept updating fine; only the four project-scoped cached stores went deaf.
Root cause: subscribeWithAutoCleanup unconditionally called onScopeDispose(stop) (correct for per-page subscriptions). But the four cached stores are memoized per project and subscribe exactly once, at store-creation time — which runs inside the setup() of whichever component first calls the make…Store factory. When that component unmounted, the scope disposed, stop() fired, and the listener was removed; the memoized store stayed cached but never re-subscribed (fs-adapter-store subscribes once, at construction). Introduced by KD-0680's onScopeDispose auto-cleanup, which optimised per-page teardown without accounting for the memoized-singleton lifetime.
Fix: Persistent subscribe + evict-on-leave: a scope-free subscribeProjectChannelPersistent plus a per-project onLeaveProject registry. The four stores subscribe persistently and register an eviction callback that drops their memoized instance on leaveAllProjectChannels/resetEcho, so a revisit rebuilds and re-subscribes. Per-page subscriptions keep scope-bound teardown.
Lesson: A subscription's lifetime must match the lifetime of the thing it feeds — scope-bound (onScopeDispose) cleanup is correct for per-component data but wrong for a memoized singleton whose listener should live as long as the cache entry. When you add an auto-cleanup optimisation, audit every consumer whose lifetime is not the mounting component's, or the optimisation silently kills long-lived subscriptions on the first unmount.

KD-0889 — FilterBar search input has no inline clear (✕) button

Severity: low
Symptom: The issues-tab filter-bar search input had no inline ✕ to clear the term — users had to select-and-delete or use the separate global "Clear all." The reports page's search already had one, so the two bars felt inconsistent.
Root cause: FilterBar.vue's search <input> was bound to the model with no per-input clear control. The inline-clear pattern existed only in the sibling SearchFilter.vue and was never carried into FilterBar.
Fix: Added a v-if="searchTerm" ✕ button mirroring SearchFilter's clear control; clicking empties the model, which both hides the button and clears the filter (the search term is the filter). Lands across all 9 pages that mount the shared bar.
Lesson: When two sibling components present the same affordance, a pattern added to one but not the other reads as a regression — shared UI affordances should be lifted to the shared component or mirrored deliberately, not implemented per-page.

KD-0882 — ProfileSidebar spec leaks a post-teardown dynamic import, flaking CI

Severity: low
Symptom: The Test tenant-core frontend CI job intermittently exited code 1 even though all assertions passed, blocking PRs until a manual rerun. The failure was an unhandled EnvironmentTeardownError.
Root cause: The spec called the defineAsyncComponent factory (() => import('ProfilePictureForm.vue'), which transitively pulls the browser-only browser-image-compression module) without awaiting the returned promise. The dynamic import raced Vitest's environment teardown; when teardown won, the late-resolving import was recorded as an unhandled rejection.
Fix: await the factory call so the import resolves within the test's lifetime.
Lesson: An un-awaited promise in a test — especially a dynamic import() — races the runner's environment teardown and surfaces as an intermittent, assertion-passing CI failure. Any async work a test triggers must complete before the test ends, or it leaks into teardown as a flake.

KD-0878 — Epic Board badge stops updating on remote drags (stale `positions` payload shape)

Severity: medium
Symptom: The Epic Board's per-epic open/closed badge stopped updating live when other users moved issues; it stayed stale until a full reload. The three issue tabs handled the same broadcast correctly.
Root cause: Since KD-0789 the backend IssuePositionEvent broadcasts a single {position} payload on the positions event. The Epic Board's handler still destructured the pre-KD-0789 array shape {positions} and looped it, so positions was undefined and for (const position of positions) threw before any state applied. The issue tabs had been migrated to the single-{position} shape (centralised in useIssueLiveSync); this call site was missed because the Epic Board hand-rolled the same five-event protocol inline instead of consuming the composable. The spec hid it by pinning the old array shape.
Fix: Adopted useIssueLiveSync on the Epic Board (widened generic over item type), replacing the five inline Echo registrations with one composable call so the broadcast contract lives in exactly one place.
Lesson: A wire-shape change ripples into every consumer that re-implements the protocol by hand — the call site that hand-rolls what a shared composable already does is the one that gets missed when the shape changes. Centralising the contract in one composable is the fix and the prevention. And a test that pins the old shape keeps a broken consumer green while prod throws.

KD-0872 — Bearer-token auth failures return a 302 redirect instead of 401 JSON

Severity: medium
Symptom: API/MCP token clients (CLI, VS Code extension, feedback button) carrying a revoked/expired/unknown Bearer token got a 302 redirect to the login page instead of a machine-readable 401. The client had no actionable signal to re-authenticate — failures were silent. This was the delivery path for the 2026-05-29 feedback-loss incident (the balloon followed the redirect into a fake 201).
Root cause: Passport's TokenGuard catches the OAuthServerException internally and returns null, so auth:sanctum throws AuthenticationException. Laravel's default renderer checks $request->expectsJson(); token clients don't send Accept: application/json, so the check fails and the request is redirected — correct for browser navigation, wrong for machine clients on /api/* and /mcp/*. (The issue's premise that OAuthServerException reaches the handler was wrong; the guard swallows it.)
Fix: Added a render() for AuthenticationException returning {code: 'TOKEN_INVALID', ...} 401 JSON for api/* / mcp/* paths that don't already expect JSON; passes through (returns null) otherwise so SPA/JSON-client behaviour is unchanged. Mirrors KD-0739's AccessDeniedHttpException expectsJson() scoping.
Lesson: The framework default of "redirect non-JSON auth failures to login" is correct only for browsers — machine clients on API/MCP paths need a structured 401, and they don't send Accept: application/json. Path-scope the override (api/*/mcp/* + !expectsJson()) so browser flows keep redirecting. Diagnose where the exception is actually thrown, not where the issue claims (the guard caught the OAuth exception two layers up).

KD-0870 — `project_tokens.active` not synced when the backing PAT is revoked

Severity: medium
Symptom: Bulk-revoking a user's personal access tokens (e.g. on 2FA enablement) flipped oauth_access_tokens.revoked = 1 but left the matching project_tokens.active = true. The Project Tokens UI reads active, so it kept showing dead tokens as live; operators couldn't tell a working token from a revoked one without querying the DB. Confirmed in prod after the 2026-05-29 sweep.
Root cause: RevokeUserTokensAction had no dependency on ProjectToken — it knew only oauth_access_tokens. The other revocation path, DeleteProjectTokenAction, kept both tables in sync; only this path desynced, because the two paths were written independently and only the delete path was built for cross-table consistency.
Fix: Injected ProjectToken; after the revocation loop, set active = false on any project_tokens rows whose token_id is in the revoked set, inside the same transaction.
Lesson: When two code paths both invalidate the same entity, both must maintain every derived/mirrored column the entity owns — one path keeping a denormalised flag in sync while the sibling forgets it guarantees the UI shows a state that contradicts the source of truth. New invalidation paths must be checked against the full set of side-effects the canonical path performs.

KD-0858 — Board card can't be moved: deterministic rank collision dead-ends MoveIssueAction

Severity: high
Symptom: Certain board cards refused to move — every drag snapped back, no matter how many retries. The backend returned 409 (RankCollisionException) and the FE reverted. Surfaced in prod (Nightwatch #168). Two coupled defects: the 409 toast showed an empty body ({"message":""}), and the user-facing copy was hardcoded per-HTTP-status in the FE rather than coming from the backend exception.
Root cause: Rank::between is fully deterministic (base-26 midpoint, no randomness). When a project's rank space is degraded (zero-width gap, stale neighbour ids) so the midpoint lands on an existing (project_id, rank) UNIQUE value, the 3-attempt retry loop recomputes the same value every time and collides identically — the loop only resolves transient concurrent collisions, never deterministic ones. KD-0808's respread recovery was wired in execute() for RankTooLongException only, so overflow self-healed but a stuck gap dead-ended. The empty body: CustomException subclasses declare copy via a protected $message default, but PHP's Exception::__construct overwrites that default with '' the instant it runs with any argument — including previous: only — and the leak-safe new X(previous: $cause) throw (mandated to avoid leaking the MySQL duplicate-key string) was therefore silently incompatible with the property-default-message pattern.
Fix: (1) Widened the reactive-recovery catch in MoveIssueAction and BulkMoveAction to RankTooLongException | RankCollisionException → respread + one bounded retry. (2) Gave CustomException a constructor falling back to $this->message when no explicit message is passed. (3) dontReportWhen filter dropping handled <500 CustomExceptions from the monitor. (4) FE renders the backend exception's data.message instead of a hardcoded status→string map.
Lesson: A retry loop only helps when the inputs change between attempts — retrying a deterministic computation reproduces the same collision forever; deterministic exhaustion needs a state-changing recovery (respread), not a retry. Separately: PHP's Exception::__construct clobbers a subclass's protected $message default to '' whenever called with any named/positional argument, so a previous:-only throw silently ships an empty message — override the constructor to restore the declared default. And the backend that owns what failed should own the words the user sees; duplicating copy in the SPA per status code drifts.

KD-0852 — My Issues badge shows a stale lane after an agent (MCP) lane change

Severity: medium
Symptom: When an agent moved an issue to a different lane via the MCP path (UpdateIssueTool, or start-work-on-issue), the My Issues page kept showing the old lane in its Status badge until a manual refresh. The broadcast fired and the row otherwise updated — only lane_title/lane_color were stale. The same move through the web UI updated correctly.
Root cause: UpdateIssueAction assigns the new lane_id, saves, then broadcasts via resources that read lane_title/lane_color off the in-memory lane relation, hydrating with loadMissing. UpdateIssueTool resolves the issue with lane already eager-loaded. Changing the lane_id FK does not refresh an already-loaded belongsTo relation, and loadMissing is a no-op when the (now stale) relation is present — so the payload carried the new lane_id but the old lane_title/lane_color. The web path route-model-binds without preloading lane, so loadMissing fetched it fresh — hence agent-specific.
Fix: Added a single $issue->refresh() after the audit-log write and before the two broadcast calls, dropping stale relation caches so the resources re-read the new lane. Rejected per-relation unsetRelation() (mock churn, only covers listed relations) and tool-level eager-load removal (leaves the root cause for other callers).
Lesson: Mutating a foreign key does not refresh an already-loaded belongsTo relation, and loadMissing won't re-fetch what's already (stalely) present — so a serializer that reads through the relation emits stale nested data whenever a caller preloaded it. A broadcasting Action that mutates FKs must refresh() (or unset the affected relations) before serializing. The bug is caller-dependent: it only appears for the path that preloads, which is why the web UI looked fine while the agent path broke.

KD-0848 — Filter bar hijacks Cmd/Ctrl+F, blocking the browser's native find

Severity: low
Symptom: On any page rendering the shared FilterBar (9 call sites), pressing Cmd/Ctrl+F opened the Kendo filter popover instead of the browser's native find-in-page. The handler called preventDefault(), so users lost the universal browser find shortcut everywhere the bar appeared.
Root cause: FilterBar.vue's isFilterShortcut matched (metaKey || ctrlKey) && key === 'f' and the keydown handler preventDefault()'d before opening the popover. Cmd/Ctrl+F is the browser's universal find accelerator.
Fix: Rebound the shortcut to a bare / (the web-standard search key), freeing Cmd/Ctrl+F to fall through untouched. The existing form-control guard already suppresses it while typing.
Lesson: Don't bind app shortcuts to the browser's reserved accelerators (Cmd/Ctrl+F/N/T/W…) — preventDefault'ing them strips a universal capability on every page the component mounts. The bare / is the conventional in-app find/search key; reserved-modifier combos belong to the browser.

KD-0845 — Comment-editor links have no underline and navigate on click

Severity: low
Symptom: In the RichTextArea comment/description editor, links rendered with no underline and clicking one navigated away instead of letting the user select/edit it. Read-only rendered prose was correct.
Root cause: Two config/styling gaps. TipTap v3 StarterKit bundles extension-link, whose Link mark defaults to openOnClick: true — RichTextArea registered StarterKit with no config. Separately, no .tiptap a CSS rule existed (editor content styles live in markdown.css), while read-only prose was underlined via .description-prose a.
Fix: StarterKit.configure({link: {openOnClick: false}}) so editor clicks edit rather than navigate; added a .tiptap a rule mirroring .description-prose a. CSS-only, not a link HTMLAttributes class, so the rendered <a> markup is unchanged and the existing real-render assertion stayed green.
Lesson: A bundled extension's defaults are inherited silently when you register the kit with no config — TipTap StarterKit's Link defaults openOnClick: true. And the editor surface (.tiptap) and read-only surface (.description-prose) are separate style scopes; a prose rule does not cover the editor. Check both the behaviour defaults and the per-surface CSS coverage.

KD-0841 — Logging time for a past date takes too many interactions

Severity: low
Symptom: The "Log Time" modal's "Started At" field opened empty (a bare native <input type="datetime-local">, default startedAt: null), so logging against a recent past day forced the user to hand-type every segment or click backward through the native calendar — described as "absurdly long."
Root cause: The field had no shortcut affordance and opened empty because new entries default startedAt: null. The friction was purely the date-selection affordance — nothing about calculation or storage.
Fix: Prefill "Started At" with now() when the create modal opens (logging against today is now zero interactions; a past date is reached by editing a populated field). The independent "auto-calculate start time" checkbox (back-dates start to now() − duration) was briefly removed as redundant, then restored once that misread its distinct purpose, and renamed for clarity. Day-preset buttons from an earlier iteration were dropped.
Lesson: An empty input is the worst default for an "edit a near value" task — prefilling the common case (now) removes the build-from-empty friction more simply than adding preset buttons. Watch for two controls with overlapping-but-distinct purpose (prefill vs back-date-on-duration): removing one as "redundant" can quietly delete a different behaviour.

KD-0839 — PDF attachment preview renders blank (iframe sandbox blocks JS)

Severity: medium
Symptom: Clicking a PDF attachment opened the preview modal but the PDF never rendered — the iframe loaded the blob URL yet stayed blank. Images previewed fine (they use <img>, not an iframe).
Root cause: The PDF <iframe> had sandbox="allow-same-origin". A sandbox attribute without allow-scripts blocks all JavaScript inside the iframe, and both Chrome's native PDF viewer and Firefox's PDF.js need JS to initialise — so the blob loaded but rendered nothing.
Fix: Removed the sandbox attribute. Blob URLs from URL.createObjectURL() are ephemeral and tab-local and the content is our own server's — no cross-origin surface to sandbox.
Lesson: sandbox without allow-scripts silently disables the JS that built-in PDF viewers depend on — a sandbox tight enough to block scripts blocks the very feature you're embedding. Don't sandbox an iframe whose source is a same-origin/tab-local blob you produced; there's nothing to isolate.

KD-0838 — AsyncErrorBoundary shows a fatal "Could not load page" for transient/user-action failures

Severity: medium
Symptom: The shared AsyncErrorBoundary rendered a full-page fatal "Could not load page." for any captured error except EntryNotFoundError. Two everyday events tripped it: a failed comment submission, and clicking an issue then hitting Back before it loaded (the fatal screen then persisted on the /board route that had loaded fine).
Root cause: Two defects. (1) Sticky boundary state — onErrorCaptured set hasError = true and never cleared it, and ProjectLayout keeps one boundary instance alive across in-project tab switches (the RouterView child swaps, the boundary doesn't), so an error raised loading one route latched and poisoned the next. (2) submitComment awaited create() with no try/catch, so a rejected POST propagated through onErrorCaptured into the boundary. There is no request-cancellation infra in the FE, so the "aborted load" was the reporter's mental model, not a literal CanceledError.
Fix: Initially: reset hasError on navigation via a resetKey prop (the boundary can't read the route — the app skips app.use(router)) plus a local try/catch on submitComment. Per PR review, replaced the per-handler guard with a systemic discrimination: onErrorCaptured now inspects Vue's info arg — 'native event handler'/'component event handler' errors propagate untouched (toast middleware surfaces them, user stays on the page), while async-setup/render/lifecycle/watcher errors still latch the fatal screen. resetKey made required so a future consumer can't forget it.
Lesson: An error boundary must distinguish a page-load failure (fatal screen is right) from a user-action failure (stay on the page, toast it) — onErrorCaptured fires for every descendant error including event handlers, so a boundary that treats all errors as fatal turns any failed submit into a full-page crash. Discriminate by Vue's info source. And latched boundary state must clear on navigation, or one route's error poisons its siblings. Prefer a required prop over an optional one for a guard that prevents a known bug (a future consumer can't silently forget it).

KD-0837 — Bar Color picker dims unselected swatches, distorting their hue

Severity: low
Symptom: In the epic "Bar Color" picker, selecting a colour appeared to change the colour of the other swatches (olive→yellow, orange→brown), as if the picker applied the wrong colour. Both report screenshots were of the picker itself, differing only in which swatch was at full opacity.
Root cause: Unselected swatches were dimmed with op-50 (whole-element opacity) over a saturated fill. CSS opacity composites the fill against whatever is behind it — the page background, which differs per theme — so a saturated colour at 50% over a dark page mixes toward black and shifts perceived hue (not just brightness). The stored value was always correct; the defect was purely the picker's render.
Fix: Replaced the bespoke opacity-dimmed swatch grid with the SingleSelect colour dropdown already used for lanes/labels, which renders colour names (text) and never a dimmed swatch — sidestepping the root cause entirely.
Lesson: Whole-element opacity on a coloured fill is theme-dependent by construction — it blends with the page background, so a "dimmed" selection indicator shifts the perceived hue differently in light vs dark mode. Indicate selection without touching the fill's alpha (a border, a check, or text-based selection), or the colour the user sees is a lie.

KD-0836 — Time-log summary cards bucket by logging date, not work date

Severity: medium
Symptom: On the Time Entries page the Today/Yesterday/Avg-per-day summary cards didn't reconcile with the filtered table — the cards counted hours on days that had no visible rows. Same dataset, different date axis.
Root cause: The summary helpers (filterByPeriod, getUniqueDaysCount) bucketed each entry on createdAt (when the entry was recorded) while the table presented each entry under startedAt ?? createdAt (when the work happened). When time is logged after the fact the two axes diverge, so the per-day cards counted hours on days the table never showed. The reporter's "summary uses a different dataset" hypothesis was wrong — same filtered dataset, wrong field.
Fix: Bucket the summary helpers on startedAt ?? createdAt, matching the table's date column. Frontend-only; the backend range filter still uses created_at (a broader product question parked).
Lesson: Two views over the same dataset must bucket on the same field, or they'll disagree without either being "wrong" — a summary and its table reconciling depends on a shared date axis (work date vs logging date), not just a shared filter. When numbers don't add up, suspect the axis before the dataset.

KD-0817 — Issue deletion fails on FK constraint for Hand-to-Claude tables

Severity: high
Symptom: Deleting an issue threw a 1451 FK-constraint violation whenever any Hand-to-Claude row referenced it (claude_issue_eligibilities, its criteria children, or claude_sessions). The same gap hit issue_label, and DeleteProjectAction was additionally missing issue_watchers cleanup and the reports.promoted_issue_id nullify the single/bulk paths already did.
Root cause: KD-0658 shipped three tenant-DB tables with restrictOnDelete() FKs to issues but didn't update DeleteIssueAction, BulkDeleteIssuesAction, or DeleteProjectAction. The arch gate that would catch this (CascadeRelationsTest) only walks HasMany/HasOne/MorphMany relations declared on the model — and Issue had no claudeSessions()/eligibility() relation, so the gate had nothing to enforce against. KD-0803 added issue_label 16 days later with the same omission; KD-0709 seeding the tables made it reproduce in dev.
Fix: Added Issue::claudeSessions() HasMany + Issue::eligibility() HasOne and listed both in cascadeRelations() so the existing arch gate now enforces future regressions. The three delete Actions guard on in-flight sessions (409), archive terminal-uncleaned sessions via a queued job capturing scalar IDs, delegate eligibility cleanup, and detach labels. DeleteProjectAction gained the missing issue_watchers + report-nullify steps.
Lesson: An arch gate that enforces "every relation is cleaned on delete" is blind to relations the model never declares — a new table with a restrictOnDelete FK is invisible to the audit until someone adds the corresponding relation. The structural fix is to declare the relation and list it in cascadeRelations() so the gate has teeth. The gate still doesn't cover BelongsToMany pivots (labels), so pivot omissions remain a known blind spot. (Same hand-maintained-cascade-list drift class as KD-0738.)

Severity: medium
Symptom: Two positioning defects in the @-mention menu. (1) Typing @ flashed the menu at the far-left (~0,0) for one frame, then it snapped to the caret on the next character. (2) Once open, the menu didn't follow the caret when the page/editor scrolled.
Root cause: (1) mountMentionList appended the element before the async updatePosition applied coordinates — for one frame it had position: static and flowed to its container's top-left; on later keystrokes it already carried position: absolute, hence first-keystroke-only. (2) floating-ui's computePosition is one-shot; position was computed on open/update only, so the body-appended menu drifted from the caret inside its scroll container.
Fix: (1) Mount hidden+positioned: set position: absolute + visibility: hidden before appendChild, reveal once the first computed coords are applied (visibility: hidden, not display: none, keeps it measurable). (2) Position via floating-ui autoUpdate (runs immediately and on every scroll/resize) and return its cleanup, run on close/Escape so no listeners leak.
Lesson: An element positioned by an async callback flashes at its static-flow origin for the frames before coordinates land — mount it hidden-but-measurable and reveal only after the first compute. And one-shot computePosition doesn't track scroll; use autoUpdate (with a cleanup wired to close) when a floating element must stay glued to a moving anchor.

KD-0752 — Markdown (.md) attachments have no in-app preview

Severity: low
Symptom: Clicking a .md attachment thumbnail did nothing — it rendered as a generic file icon with only a download button. Upload and MCP fetch already worked; only the web preview failed.
Root cause: Markdown was absent from the frontend previewability path. isPreviewableMimeType returned true only for images + PDFs, so the thumbnail click never emitted preview for a .md, and the preview modal had no markdown branch (it would fall through to a raw <iframe>). Stored MIME for .md is unreliable (text/plain from finfo), so detection has to key off the .md/.markdown filename extension, not MIME.
Fix: Added isMarkdownFilename (extension-based), made the thumbnail previewable for markdown, and added a preview-modal branch that fetches the bytes through the auth'd download endpoint, reads the blob as text, and renders via the app's existing renderMarkdown → DescriptionProse stack. Frontend-only.
Lesson: A previewability check keyed on MIME alone misses file types whose stored MIME is generic (.md → text/plain) — detect by extension when the MIME is unreliable. And when an app already owns a renderer for a content type, wire the preview path into it rather than falling through to a raw iframe.

KD-0644 — Empty toast container stays rendered on every page after toasts dismiss

Severity: low
Symptom: After the fs-toast 0.2.0 migration, the <div popover="manual"> toast container never disappeared once all toasts dismissed — it lingered as an empty fixed box on every route (pointer-events-none, so it didn't block clicks, but always present).
Root cause: fs-toast hides the closed container by calling el.hidePopover() and relying on the UA rule [popover]:not(:popover-open){display:none}. But App.vue applied a bare flex utility (→ display: flex) directly to that element. Author-origin display: flex always beats the UA-origin display: none in the cascade regardless of specificity, so the container never collapsed when closed. The flex/fixed/z-1050 attrs predated the migration and were harmless until the popover-based hide began depending on display.
Fix: Gated the display to the open state: replaced the unconditional flex with class="popover-open:flex", so display: flex applies only while :popover-open, and the UA rule hides the empty container.
Lesson: An author-origin display declaration unconditionally beats a user-agent display: none rule — so any styling that hard-sets display on a [popover] element defeats the Popover API's own hide. Scope display utilities to :popover-open when the hide is delegated to the UA rule. A previously-inert utility can become load-bearing the moment a dependency starts relying on the property it sets.

KD-0624 — CascadeRelationsTest skips Tenant and misses trait-provided relations

Severity: medium
Symptom: Audit-coverage defect, not a runtime crash: CascadeRelationsTest (the ADR-0002 gate ensuring every tenant model enumerates its cascade relations) was green precisely because two blind spots let it skip the cases it should catch. Un-blinding it surfaced four previously-invisible relations (Tenant::githubInstallations + three Passport relations on User).
Root cause: Two intentional skips. (1) Tenant was hardcoded into a $centralModels exclusion list justified as "never deleted via application logic" — false, since DeleteTenantAction cascades real relations. (2) The "all relations listed" test filtered out any relation contributed by a trait; but PHP flattens trait methods onto the using class (reflection reports getDeclaringClass() === Tenant/User), so those relations are reachable and were being thrown away — for every model, not just Tenant.
Fix: Removed the $centralModels exclusion and the trait-method filter, and replaced them with an explicit $nonCascadeRelations allowlist (each entry justified inline) so every discovered relation must be either in cascadeRelations() or acknowledged here. Verified by transiently dropping an entry and confirming the test then fails.
Lesson: A test exclusion justified by a comment ("never deleted", "trait-provided, skip") is a place bugs hide — the green suite was an artifact of the audit skipping its hardest cases. An audit should never silently drop a category; require every discovered item to be explicitly handled or explicitly acknowledged, so the skip list itself is reviewable. (Same exclusion-comment-rot theme as KD-0786.)

Severity: medium
Symptom: Four regressions from the "bundle tooltips into components" pivot, caught in PR review. Tooltip.vue's wrapper <div> participated in layout, breaking call sites relying on flex/absolute positioning (ml-auto watch button, report-card copy button). IconButton/ReportListItem sniffed $attrs['aria-label'], producing empty tooltips and inaccessible buttons when callers used title= or omitted the label. DragElement re-introduced a trailing-space assignee label for single-name users.
Root cause: Tooltip.vue wrapped its slot in an inline-block <div> that the parent's flex/grid algorithm laid out as a real item, so layout-affecting attrs (ml-auto, absolute) landed on the inner <button>, not the layout participant. The $attrs['aria-label'] sniff was brittle — any caller using title= or omitting aria-label silently got an empty tooltip and a button with no accessible name.
Fix: Tooltip.vue wrapper switched to display: contents (anchoring floating-ui to firstElementChild) so it no longer participates in layout; IconButton got an explicit required label: string prop (sweeping ~25 call sites off the $attrs sniff); DragElement label changed to [firstName, lastName].filter(Boolean).join(' ').
Lesson: A wrapper element silently changes its children's layout context — a tooltip/HOC wrapper that isn't display: contents becomes a real flex/grid item and misplaces positioning utilities meant for the wrapped element. And sniffing a value off $attrs is a brittle implicit contract: make it an explicit, required prop so a missing label is a type error at the call site, not an empty tooltip + inaccessible button at runtime.

KD-0807 — Multi-select drag in Backlog persists only one issue's move

Severity: medium
Symptom: Multi-selecting N issues in the Backlog and dragging one across sprint sections moved all N cards visually, but only ONE issue's change persisted. The other N−1 silently reverted on the next sync/refresh. The BulkActionBar "Move to" dropdown hit the same bug.
Root cause: Regression from KD-0789's fractional-rank drag rewrite. The legacy bulk-update shape posted N updates in one request; the new single-issue moveIssueForProject(issueId, payload) posts one move at a time, and the drag store's diff helper (findLaneChangedItem) returned on the first lane-changed item. So exactly one move request fired regardless of how many cards the user moved. The single-issue endpoint cannot carry N issues atomically.
Fix: Dedicated BulkMoveAction + POST /api/projects/{project}/issues/bulk-move (mirroring the precedent BulkAssignEpicAction), with a sprint-only {issue_ids, target_sprint_id, position} payload — each issue keeps its lane + epic, only sprint_id + rank change. N ranks spread logarithmically via Rank::spread. FE drag store gained a bulkUpdate() path that collects every lane-changed item.
Lesson: When you replace a bulk endpoint with a single-item one, audit every multi-select code path that fed the old shape — a diff helper that "returns the first changed item" silently drops the rest. And bulk operations need a dedicated atomic endpoint; you can't synthesize N-item atomicity by looping a single-item call. Sequentially Rank::between-ing N cards also degrades rank length ~1 char per 4 cards, so spread balanced midpoints instead of chaining.

KD-0798 — VS Code extension shows no issues after API shape change

Severity: high
Symptom: Opening a project in the VS Code extension showed no issues and fired a "API error" notification. Assignee avatars rendered [object Object] as the image src.
Root cause: KD-0774 changed IssueResourceData to return branch_links as full nested objects ({id, branch_name, branch_url, status}) instead of a flat status array. The extension was only partially updated — it kept the field name but typed each link as {status}, so it still fired N secondary GET .../branch-links calls to fetch data already present in the initial response. Separately, profile_picture was typed string | null but the API now returns {avif, webp} | null, so the raw object was passed through as an image URL.
Fix: Widened the extension's branch_links type to the full shape and derived branch names/URLs directly from the initial response (dropping the N secondary calls). Fixed profile_picture type and extracted avif ?? webp ?? null.
Lesson: A backend resource shape change ripples into every client that wasn't updated in lockstep — the extension is a separate consumer with no shared type contract, so a partial update left it making redundant calls AND mis-rendering. When a response field changes from scalar to object, every client's type and every place it's interpolated (especially as a URL src) must be revisited.

KD-0788 — Central-binding arch test over-detects: only 5 of 14 flagged Actions were true gaps

Severity: medium
Symptom: The KD-0783 broader arch test listed 14 Actions in centralBindingKnownGaps(), framed in the issue as "13 quick-wins, just add the binding." Latent/low-volume — the binding gaps would surface as audit-transaction crashes only when the affected central paths ran in prod.
Root cause: The arch test is a structural detector ("Action injects a central model AND has ->transaction(") but the canonical bug is behavioural ("the outer transaction opens on the wrong connection"). Of the 14: 5 were true gaps ($this->db->transaction wrapping central writes); 5 were already correct (transaction opened via $model->getConnection()->transaction(), which the test couldn't distinguish); and 4 were tenant-primary Actions where binding $this->db to central would invert the bug — opening a central transaction while tenant writes committed unsynchronised.
Fix: Bound the 5 true gaps in AppServiceProvider. Refined the arch test to require both ConnectionInterface injection AND ->transaction( (dropping the 5 model-getConnection Actions naturally). Added inline @central-binding-exempt: markers to the 4 tenant-primary Actions and taught the test to honour them. Emptied centralBindingKnownGaps() to [].
Lesson: A structural arch heuristic and the behavioural bug it targets coincide for the obvious cases and diverge for the rest — a "known gaps" list taken at face value will misclassify. Reading each flagged site beats trusting the count: blindly "adding the binding" to all 14 would have broken 4 working Actions. When a heuristic over-detects, the fix is to tighten the heuristic AND provide an auditable escape-hatch marker, not to suppress with a grandfather list.

KD-0787 — RollbackProvisioningAction unbound + inline DROP DATABASE on the bound connection

Severity: medium
Symptom: Same latent shape as KD-0783. Dormant in prod (DOMAIN_PROVISIONING_ENABLED=false). The moment the rollback path ran with provisioning enabled, the central audit-write would throw AuditLogWriter must be called within a database transaction because the outer $this->db->transaction(...) opened on the default connection, not central.
Root cause: Two coupled defects. (1) The Action injected a generic ConnectionInterface and wasn't contextually bound to central, so the audit-log model's hardcoded central connection saw transactionLevel() === 0. (2) An inline $this->db->statement('DROP DATABASE...') would route through whatever connection got injected — once bound to central, central's MySQL user may lack DROP privileges. (1) couldn't land without (2): binding to central while the inline DDL remained would make rollback DDL fail in any locked-down environment.
Fix: Injected the already-tenant-bound DropTenantDatabaseAction for the DDL, added RollbackProvisioningAction to the central ConnectionInterface binding, and promoted its arch-test entry from centralBindingKnownGaps() to centralActionsRequiringBinding() (turning the gate into a regression test).
Lesson: Binding an Action's connection and extracting its cross-connection DDL are coupled changes — you can't safely flip the binding while inline statements still inherit it. DDL that needs different privileges (DROP DATABASE on tenant) must be delegated to a connection-explicit sibling Action before the parent's connection is rebound.

KD-0786 — ProvisionDomainAction unbound from central despite being central-only

Severity: medium
Symptom: Same crash shape as KD-0783, dormant behind DOMAIN_PROVISIONING_ENABLED=false. Would fire LogicException on every provisioning state transition that emits an audit row the moment the flag flipped on.
Root cause: The Action's outer transaction resolved ConnectionInterface to the default connection because it wasn't contextually bound to central. It was excluded from the binding with a stale comment claiming it "also does tenant-DB work" — but a post-KD-0580 state-machine refactor had made the Action central-only (Domain model is central, audit is central, the advance() branches call external providers not the DB, no TenantSwitcher). The exclusion rationale had outlived its truth.
Fix: Added ProvisionDomainAction to the central ConnectionInterface binding and moved its arch-test entry from gaps to required. No change to the Action — only container wiring was missing.
Lesson: Exclusion comments rot. A "we can't bind this because it does X" note must be re-validated against the current source before it's trusted — a later refactor can remove X and leave the stale exclusion silently masking a bug-in-waiting. The arch test's sentinel ("an audit-writing Action must be in exactly one of bound-list or gaps-list") is what forced the decision instead of letting it sit undecided.

KD-0785 — CreateTenantAction crashes on first prod central-admin invite

Severity: high
Symptom: Same latent KD-0783 shape. The central-admin "invite a tenant" flow (POST /api/central/tenants) worked in dev/test (where DB_CONNECTION falls through to central) but the first invitation in prod would throw the audit-transaction assertion because the outer transaction opened on the default mysql connection while writes hit central models.
Root cause: The Action wasn't in the central ConnectionInterface binding because its private createAdminUser opened an inner transaction on the same $this->db after a TenantSwitcher::switchTo() — and those inner writes target the tenant DB (admin User row + role pivot). Binding the whole Action to central would route the inner tenant transaction to central too. Same shape SignupAction had pre-KD-0783.
Fix: Mirrored KD-0783's extraction exactly — pulled the tenant-DB work into a new sibling CreateInvitedTenantAdminUserAction (injecting the default tenant ConnectionInterface and owning the switchTo/reset lifecycle), then bound CreateTenantAction to central and promoted its arch-test entry to required.
Lesson: An Action that opens transactions on two different connections cannot be contextually bound to either — the only clean fix is to split the second-connection work into its own Action with its own binding. The "extract the tenant-DB half into a sibling Action" pattern is now the canonical resolution for this entire class (KD-0783 → KD-0785 → KD-0787).

Severity: high
Symptom: Every public signup at central.kendo.dev/signup returned HTTP 500 with RuntimeException: AuditLogWriter must be called within a database transaction for hash chain integrity. Production-only — the exact path passed every CI test and worked locally. A partial central row (Tenant insert) could commit before the audit assertion fired.
Root cause: SignupAction and CreateDomainAction injected ConnectionInterface with no connection name, so the container resolved the default connection. Prod's .fly/config/prod.toml sets DB_CONNECTION=mysql, so $this->db->transaction(...) opened on mysql. But the audit-log models hardcode $connection = 'central', and assertWithinTransaction checks central's transaction level — found 0 (the open transaction was on mysql) and threw. Tests didn't catch it because DB_CONNECTION is unset in tests, so both connections resolved to the same instance. The same defect was latent in every central audit-writing Action.
Fix: Contextually bound ConnectionInterface to central in AppServiceProvider for all 10 central audit-writing Actions. Extracted SignupAction's tenant-DB createAdminUser into CreateTenantAdminUserAction (default connection) and its inline DROP DATABASE into DropTenantDatabaseAction (tenant connection). Added an arch test forcing every newly-discovered central audit-writing Action into either the bound list or a known-gaps list.
Lesson: Injecting an unqualified ConnectionInterface silently binds to whatever DB_CONNECTION resolves to — fine until a model hardcodes a different connection, at which point transaction-scoped invariants check the wrong connection. Tests that leave DB_CONNECTION unset collapse distinct connections into one instance and hide the entire class; an arch test that asserts the binding decision is the only reliable gate. (Also surfaced two side findings: DB_CONNECTION=mysql references a connection that doesn't exist in config — it only resolves via Laravel's framework-default merge — and DB_PASSWORD was visible in plaintext via fly ssh console env.)

KD-0761 — Avatar initials low-contrast, undersized, off-center

Severity: low
Symptom: Fallback avatar initials were hard to read — in dark mode, white text on bright tint backgrounds (e.g. #4ade80) hit contrast ratios as low as 1.3:1, far below WCAG AA's 4.5:1. Initials also rendered too small and sat slightly high.
Root cause: ProfilePicture.vue hardcoded c-white for initials regardless of theme; the dark-mode tint palette uses high-luminance backgrounds where white text fails contrast. Font sizes were ~35-40% of the container, weight was too light, and no optical baseline compensation was applied for uppercase glyphs.
Fix: Added a theme-aware --avatar-initials CSS variable (near-black in dark mode, white in light), bumped font sizes and weight to 700, added items-center + a 0.5px translateY optical nudge.
Lesson: Hardcoding c-white for text-on-color assumes dark backgrounds — a bright tint palette inverts that assumption. Contrast-sensitive colors must be theme-aware tokens, not literals. (The fix also caught a recurring trap: stale test assertions pinning the pre-fix font sizes blocked the first verification.)

KD-0760 — Delete confirmation dialog shows "Submit" instead of "Delete"

Severity: low
Symptom: Destructive confirmation modals (delete attachment, delete issue, delete tenant, etc.) showed a generic "Submit" confirm button instead of a destructive verb — ambiguous, and visually identical to a benign save, raising accidental-confirmation risk on irreversible actions.
Root cause: confirmModal defaults confirmButtonText to 'Submit'. Nine destructive call sites omitted the third positional argument and inherited the generic label. (~23 other call sites already passed an explicit verb, so the misbehaviour was purely the default kicking in on incomplete calls.)
Fix: Passed an explicit 'Delete' (or equivalent verb) at all 9 destructive call sites. Left the helper's 'Submit' default in place but now unreachable from any destructive flow.
Lesson: A permissive default on a shared destructive helper is a latent footgun — every incomplete call site silently inherits the wrong label. For destructive actions the safer design is no default (force the caller to name the verb), but at minimum every call site must be audited when a default is too generic to be safe.

KD-0759 — File-upload drop zones too short to hit reliably

Severity: low
Symptom: Drop zones (attachment uploaders, profile picture modal) rendered short — standard ~100px, compact ~40px, profile ~80px. Files released slightly above/below the dashed border missed the drop and landed on the page.
Root cause: None of the three dropzone surfaces set a min-h-*, so the affordance collapsed to its icon + text height. The @drop handler fires on the same <div> that draws the border, so the visible affordance IS the hit area — a short visual yields a short hit target. The compact variant fell below the WCAG 2.5.5 44px floor.
Fix: Added min-h-32 (128px) to standard + profile dropzones, min-h-14 (56px) to compact, plus flex centering. No padding/border/copy changes.
Lesson: When the visual affordance and the event target are the same element, the visual size directly determines the hit area — sizing it for "looks fine" isn't the same as sizing it for "easy to hit." Set an explicit minimum height against a real interaction target (WCAG 2.5.5's 44px floor as the baseline).

KD-0756 — Invite form not reset after a successful invite

Severity: low
Symptom: After inviting a user, re-opening the invite modal showed the previous person's data still populated in every field. Users had to refresh to get a clean form, risking re-submitting the same details under a different email.
Root cause: newInvite is a module-scoped ref passed by reference to the modal each time it opens; the form mutates that object in place via v-model. The onSubmit success path posted, refreshed, toasted, and closed the modal — but never reset newInvite.value, so stale data persisted across openings.
Fix: One line — reassign newInvite.value to empty defaults after closeModal() in onSubmit. The toast captures the invitee's name before the reset; a failed invite throws before the reset, keeping the form populated for retry.
Lesson: A reused, mutated-in-place form model needs an explicit reset on the success path — close-and-reopen does not clear it because the same object reference is handed back. Reset after success, but only after success, so failures keep the user's input for retry.

Severity: low
Symptom: Tabbing from a form's title field into the RichTextArea description forced keyboard users through ~8 formatting toolbar buttons first (H1/H2/H3/Bold/Italic/UL/OL/Raw). Reproduced everywhere RichTextArea is consumed (comments, issue templates, AI story prompt, epic form).
Root cause: FormatButton.vue was a plain <button> with default tabindex="0", and the toolbar <div> precedes the editor content in DOM order — so every button became a tab stop before the editor. No role="toolbar" or roving-tabindex consolidated them into one stop.
Fix: Applied the WAI-ARIA toolbar pattern (roving tabindex): role="toolbar" + aria-label, exactly one button at tabindex="0" and the rest at -1, arrow keys for internal navigation. Format actions remain reachable via Tiptap shortcuts; the raw-mode toggle stays keyboard-reachable (which a flat tabindex="-1" strategy would have lost).
Lesson: A group of related controls should be a single tab stop with internal arrow-key navigation (the WAI-ARIA toolbar pattern), not N sequential tab stops. A shared component is the right fix point — wiring roving tabindex once protects every consumer.

KD-0753 — LinkBranchTool rethrows raw exceptions as JSON-RPC -32603

Severity: medium
Symptom: Any failure in the MCP LinkBranchTool rethrew the raw Throwable, which the MCP framework mapped to a generic -32603 internal error. Callers couldn't distinguish a duplicate branch link from a cross-project mismatch, a deadlock, or a broadcast failure. Under parallel agent fan-out the error was consistent and unactionable.
Root cause: The top-level catch captured the exception for the audit log but then throw $throwable'd the raw exception — violating the documented Exception-Leak Discipline ("MCP tools must never rethrow raw Throwables from their top-level catch"). Concurrent calls could throw deadlock/unique-constraint exceptions from InnoDB gap locks, all flattened to -32603.
Fix: Replaced the rethrow with three ordered catch blocks: BranchAlreadyLinkedException and CrossProjectException return specific structured messages; remaining Throwable is logged with scoped context and returned as a generic structured error instead of rethrown.
Lesson: At a protocol boundary (MCP/JSON-RPC), a raw rethrow collapses every distinct failure into one opaque code — the caller (often another agent) loses all ability to act. Top-level catches at such boundaries must map known exceptions to structured errors and log-then-wrap the unknown, never rethrow raw.

KD-0738 — Project deletion 500s on unhandled RESTRICT foreign keys

Severity: high
Symptom: DELETE /api/projects/{project} returned 500 for any project that had been used (triggered a Claude session, been watched, linked to a tenant AI key, etc.) with SQLSTATE[23000]: Integrity constraint violation.
Root cause: DeleteProjectAction walks a hand-maintained list of descendant tables to delete before the parents. That list was last extended at a 2026-02-18 cascade-to-restrict audit. Six tables with restrictOnDelete() FKs into the project subtree have landed since (claude_sessions, issue_watchers, attachment_extracted_contexts, claude_issue_eligibilities, claude_issue_eligibility_criteria, tenant_ai_key_project) and none were cleaned up, so MySQL blocked the parent delete.
Fix: Added six raw db->table(...)->whereIn(...)->delete() calls inside the existing transaction, in FK dependency order (criteria before eligibility, contexts before attachments, pivot before project).
Lesson: A hand-maintained cascade-delete list is guaranteed to drift — every new table with a RESTRICT FK into the subtree must be manually added, and nothing fails until a populated project is deleted in prod. This bug class has no automated check today; an arch test that diffs schema FKs into project/issues against the Actions that delete them would close it. (Related drift: BulkDeleteIssuesAction already deleted issue_watchers but DeleteProjectAction didn't — the two tear-down paths had diverged.)

Severity: medium
Symptom: Two defects. (1) The Complete Sprint modal always showed "What should we do with incomplete issues?" even when every issue was already Done. (2) After completing a sprint, the board kept showing the completed sprint until a manual reload.
Root cause: (1) hasNoIssues checked issuesCount === 0 (total issues) instead of incomplete-issues count — and the backend never exposed an incomplete count, so there was nothing else to check. (2) CompleteSprintAction was the only mutating sprint Action that never called SprintBroadcaster->updated(), so the reactive store was never notified. A follow-up surfaced a third issue: makeSprintStoreForProject wasn't memoized, so the modal's retrieveAll() refreshed a different store instance than Backlog used — and since the broadcast ships with ->toOthers(), the originator's UI had no refresh path at all.
Fix: Added lazily-computed incomplete_issues_count to SprintResourceData; switched the modal to check it. Injected SprintBroadcaster into CompleteSprintAction. Memoized the sprint store by projectId and derived hasNoIssues from the freshly-retrieved store value rather than the stale prop snapshot.
Lesson: "No issues to move" is a count of incomplete issues, not total — modeling a domain question with the nearest-available field produces noise. And a mutating Action that skips the broadcast its siblings all fire is a silent realtime gap. The deeper trap: ->toOthers() excludes the originator, so the person who triggered the action depends entirely on the HTTP response refreshing the same store instance — an unmemoized store factory quietly breaks that for the one user who most expects to see the result.

KD-0733 — Markdown tables render as unstyled plain text

Severity: low
Symptom: GFM tables in issue descriptions showed header and cell text with no borders, no row separation, no padding — indistinguishable from two lines of plain text.
Root cause: marked correctly emitted <table>/<thead>/<tr>/<th> and DOMPurify preserved them, but markdown.css defined .description-prose styles for every other prose element and had no rules for table elements. The browser's default table rendering has zero borders.
Fix: Added .description-prose table styles (border-collapse, borders via var(--border), header background, alternating row background, padding) matching the file's existing visual language.
Lesson: A prose stylesheet is only complete for the elements it explicitly targets — when a markdown renderer can emit an element type (tables) that the prose CSS never styled, it falls back to unstyled UA defaults. Cross-check the renderer's full output tag set against the prose stylesheet's coverage.

Severity: medium
Symptom: large shared modals (1360px design width) overflowed the viewport across the entire 1024–1359px range — the half-screen-of-a-1920px-monitor band up through small desktops. Right edge clipped, close button hidden, horizontal scroll appeared.
Root cause: BaseFormModal/BaseShowModal switched to fixed pixel widths (lg:w-100/220/340) at the lg breakpoint with no viewport guard. Below lg the w-90vw fallback was already viewport-capped, so the bug only triggered in the lg+ band where the fixed width exceeded the viewport.
Fix: Added max-w-95vw to every entry in both size maps, so resolved width became min(design-width, 95vw).
Lesson: A fixed pixel width above a breakpoint assumes the viewport is always wider than the design width — false for half-screen and mid-desktop widths. Any fixed-width element needs a viewport-relative max-width cap. An arch test scanning for lg:w-<n> on a <dialog> child without a matching max-w-<n>vw would prevent the regression class.

KD-0700 — Hand-to-Claude grader verdict read from a key Anthropic never sends

Severity: high
Symptom: Every graded Hand-to-Claude session was recorded as Failed regardless of the actual grader verdict — the UI said "Claude could not finish the issue" even when the grader explicitly satisfied the rubric. Confirmed in prod on a session whose Anthropic events API showed result: "satisfied" but kendo stored status = Failed.
Root cause: handleOutcome read rawPayload['outcome']['result'] from the webhook, but Anthropic's outcome_evaluation_ended webhook is a notification only — it carries no outcome key at any depth. The real verdict lives on the session's outcomeEvaluations[] list, already returned by the getSession retrieve call — but aggregateEvents walked the event stream for tokens/iterations only and never surfaced it. A secondary defect: triggerCleanup archived the session unconditionally on the first status_idled webhook (fired between implementer end_turn and grader start, while Anthropic had flipped back to running), 400ing on every run.
Fix: Added outcomeResult to SessionResultData, populated from the latest outcomeEvaluations entry in getSession, and read the verdict from there instead of the webhook payload. Gated triggerCleanup on the session reporting idle/terminated rather than running.
Lesson: Reading state from a webhook payload that the provider documents as a notification-only event guarantees a wrong answer — the authoritative state must come from the retrieve call. The test suite hid it because the test helper fabricated the outcome key the provider never sends: a fixture builder that synthesizes a shape no real API produces will keep a bug green forever.

KD-0699 — PR-evidence parser rejects every verbatim MCP response

Severity: high
Symptom: Every Hand-to-Claude implementer session that successfully opened a PR was terminated as Failed/missing_pr_evidence before the grader ran — burning the full token spend with no verdict and permanently marking the issue Failed. The implementer pasted the MCP tool response verbatim as instructed.
Root cause: extractPullRequestUrlFromMessage read $decoded['html_url'], but the GitHub MCP create_pull_request tool returns {id, url} with no html_url. Both the kendo parser AND the implementer system prompt's "good output looks like this" example encoded the GitHub REST API shape rather than the MCP tool's actual shape — so the prompt-mandated verbatim quoting was structurally guaranteed to fail the parse. The test suite passed because the test helper generated the same wrong shape the prompt documented.
Fix: Made the parser accept url ?? html_url (MCP shape first, REST shape as forward-compatible fallback), both still validated through the github.com PR-URL regex. Updated the prompt example and test helper to the real MCP shape. Downstream verifyPullRequestOpen still confirms the PR exists, so the looser key set didn't weaken fabrication defence.
Lesson: When a prompt instructs the model to quote a tool's output verbatim, the parser must match what the tool actually emits, not what an API doc says — and the prompt example, the parser, and the test fixture must all agree on the real shape. Three places encoded the same imagined shape; prod was the first witness. Audit fixture builders for synthesized-vs-real payloads. (Side note: the diagnosed session cost ~$31 on Opus for well-trodden work — flagged a model-tier question.)

KD-0693 — Anthropic session cleanup 400s archiving the primary thread

Severity: high
Symptom: Every terminal Hand-to-Claude session hit 400 invalid_request_error: "The primary thread cannot be archived; archive the session instead." Because the exception threw before cleaned_up_at was stamped, the webhook job retried forever and the hourly prune re-hit the same failure every tick (9 occurrences in two hours post-deploy).
Root cause: ArchiveAnthropicSessionResourcesAction walked every thread via streamThreadsForSession and called archiveThread on each — including the primary thread, which Anthropic rejects. The streamer yielded the primary thread (parentThreadID === null) despite its contract implying only archivable threads. Compounding it, the Action never called sessions->archive($sessionId) at all, so sessions were never archived on the Anthropic side even before the 400 surfaced.
Fix: Filtered the primary thread (parentThreadID === null) out of streamThreadsForSession, and added an archiveSession primitive called once after the child-thread and vault archives, before stamping cleaned_up_at.
Lesson: When a cleanup step throws before its idempotency marker is set, it retries forever and turns a single failure into a recurring incident — cleanup loops must either tolerate the rejecting case or stamp progress before the fragile call. And a streamer whose contract says "things you can archive" must actually filter to that set, or every caller inherits the exception.

KD-0691 — `session.status_idled` webhook events silently dropped

Severity: high
Symptom: When a Hand-to-Claude session completed naturally, Anthropic emitted session.status_idled — the dedup row was written (so processed_at looked healthy) but no status update, no completion comment, no audit row, no broadcast, and no cleanup ran. From kendo's POV the session was permanently in flight; Anthropic-side resources sat until the 30-day TTL.
Root cause: All three match ($data->eventType) blocks in HandleSessionWebhookAction only enumerated outcome_evaluation_ended and status_terminated — session.status_idled fell through to default => null. The dedup row was written before the inner match, which is exactly what made the failure silent: the webhook-events table looked processed while the session stayed Pending.
Fix: Added session.status_idled to all three match blocks and a handleIdled method branching on stopReason (end_turn → Completed, anything else → Failed), mirroring handleTerminated. Defensive early-return-with-warning if the aggregated result is null.
Lesson: A match with a silent default => null arm is a trap for event-type handling — a new (or unhandled) event type produces no error, just missing work. And writing a dedup/processed marker before the work means the marker lies when the work is skipped: record "processed" only after the handler actually runs, or unhandled events masquerade as healthy.

KD-0634 — Filter state leaks across projects, blanking Backlog/Board

Severity: medium
Symptom: Project-scoped issue filters (selected lanes/epics/creators/sprints) persisted across navigation between projects and across reloads. Because lane/epic/sprint IDs are per-project auto-increment keys, a filter from Project A matched nothing in Project B — the middle pane rendered empty with a stale filter chip showing. Affected Backlog, Board, and Overview.
Root cause: filters.ts declared module-level singleton refs persisted under global localStorage keys. Four held project-scoped IDs; the matchers did strict ID equality with no project-membership check, so a previous project's IDs filtered out every issue in the current one.
Fix: Per-slot storage keys — each project gets its own slot (issue-filters.{projectId}.selectedLanes), hydrated on setFilterProject(projectId). Cross-project MyIssues routes through a fixed myissues slot. Rejected the simpler "single global key + reset on project change" because localStorage is shared across tabs, so a Ctrl+Click into Project B would silently wipe Project A's filter in another tab.
Lesson: State persisted under a global key but holding scope-specific identifiers will leak across scopes — and the simpler "reset on change" fix breaks under multi-tab because localStorage is origin-shared. Per-scope storage slots sidestep both the leak and the cross-tab race. (Also: selectedSprints was dormant — declared and cleared but wired into no page; clearing it for hygiene future-proofs whoever wires it up.)

KD-0631 — Blank page when async component setup fails

Severity: medium
Symptom: When an HTTP request failed during a page's async <script setup> (observed with 429s in prod during rapid navigation), the page content rendered blank — no error state, no retry. Error toasts ("Too Many Attempts.") did appear, but the content never rendered.
Root cause: Three layers combined. (1) Pages fired unguarded await Promise.all([...]) at the top of async setup — one rejection killed the whole batch. (2) Layouts wrapped <RouterView> in <Suspense>, which has no native error slot — an async child rejection puts Suspense into an unrecoverable blank state. (3) App-level onErrorCaptured only handled EntryNotFoundError and re-threw everything else.
Fix: A shared AsyncErrorBoundary component (using onErrorCaptured) placed at the two Suspense boundaries (ProjectLayout, SharedDomainLayout), rendering "Could not load page" + a "Go back" button instead of blanking. Explicitly passes EntryNotFoundError through to the existing App.vue handler. No per-page changes.
Lesson: Vue's <Suspense> has no error slot — an async setup rejection blanks the subtree unrecoverably unless an error boundary wraps it. Toasts surfacing the error don't help; the failure is a separate code path. One boundary at the Suspense seam covers every page beneath it. (Investigation also spun off KD-0679/0680/0635 on the underlying rate-limit pressure from unconditional refetches and orphaned broadcast subscriptions.)

KD-0512 — Reports detail pane cramped at tablet / mid-desktop widths

Severity: low
Symptom: The Reports page right-hand detail pane became unusable between ~768px and ~1200px — at 960px the report title wrapped character-by-character and the AI stepper labels overlapped into mush; at 768px the pane collapsed to a ~50px sliver.
Root cause: The two-pane layout had only one breakpoint guard (lt-md:flex-col at 768px). A fixed 400px left pane + persistent sidebar + padding left the detail pane only ~300-500px across the entire 768-1200px band — below the AI stepper's ~480px minimum. The epic documented a "narrow" breakpoint at <1100px that the Reports page never honoured.
Fix: Added a shared isNarrow ref (NARROW_BREAKPOINT = 1100) to the breakpoint service and extended the existing master-detail pattern to fire below 1100px — list OR detail, not both, with a back button.
Lesson: A layout built two-pane-first for wide monitors needs a breakpoint between "wide desktop" and "mobile stack" — the half-screen / mid-desktop band gets the worst of both otherwise. When an epic already defines a "narrow" threshold, page-level layouts must honour it via a shared signal rather than inventing per-page breakpoints.

KD-0687 — Implementer agent silently reverts to `always_ask` on every re-run

Severity: high
Symptom: The Implementer agent's GitHub MCP calls (create_branch, push_files, create_pull_request, …) parked the session waiting on a human after any re-run of the provisioner. Production currently worked only because the policy was patched out-of-band via manual curl.
Root cause: backend/scripts/provision-hand-to-claude.php built the Implementer's mcp_toolset without a permission_policy field. Anthropic's Managed Agents API defaults an absent permission_policy to always_ask, so the script — the supposed source of truth — disagreed with the live agent state, and any re-run silently overrode the manual fix.
Fix: Extracted the toolset block into a require-returns-array companion (backend/scripts/lib/hand-to-claude-implementer-tools.php) that declares $alwaysAllow = ['type' => 'always_allow'] once and applies it to both the toolset's default_config and every entry in configs. Added a unit test asserting the shape.
Lesson: Provisioning scripts that PATCH external systems must declare every policy field explicitly — relying on API defaults means the script and the live state can diverge silently, and any "fix" applied out-of-band is one re-run away from being clobbered.

KD-0663 — Issue show page does not update from broadcasts

Severity: medium
Symptom: Editing an issue's title in tab B didn't re-render tab A's <h1> until manual reload. The bell-watch toggle had its own GET endpoint, optimistic-rollback try/catch, and a manual race counter — none of which updated when other tabs watched/unwatched.
Root cause: Show.vue had no project-channel broadcast subscription at all. A mid-fix attempt introduced a per-issue channel (Tenant.{t}.Project.{p}.Issue.{id}) + page-scoped useLiveIssueDetail composable + applyResource/setById leaks on the issue store — re-implementing what lanes / sprints / comments already did via the project-wide ProjectDomainUpdateEvent channel. The fs-adapter-store package docs explicitly call out exposing setById as an anti-pattern.
Fix: Reverted the per-issue channel; broadcast full IssueResourceData on the existing project-wide channel via ProjectDomainUpdateEvent. Added watcher_ids to IssueResourceData, made ToggleIssueWatchAction return the issue and fan out via IssueBroadcaster::updated(), and replaced the watch GET/optimistic plumbing with a one-line computed + an issue.watch() adapter method that uses the package's sanctioned storeModule.setById.
Lesson: Before inventing a new realtime channel or store mutator, check whether sister relations (lanes/sprints/comments) already solve it on the project-wide channel — payload-size arguments rarely justify the architectural cost once you measure (typical issue ~2.5 KB compact vs Reverb's 10 KB ceiling). And when an adapter package documents setById as an anti-pattern, exposing it on the store wrapper is a code smell, not a workaround.

KD-0654 — IssueForm submit button not disabled during in-flight save

Severity: medium
Symptom: Rapid double-click on Update/Create/Promote fired the handler twice in parallel before the first round-trip resolved. Edit popped history twice; Create produced a duplicate issue with orphan attachments associated to only one; ReportDetail promoted the report twice.
Root cause: IssueForm.vue's submit button had no :disabled binding, and none of the three call sites (Edit.vue, Create.vue, ReportDetail.vue) wrapped their await in an isSubmitting guard or try/finally. The browser fired duplicate submit events freely; the async operations were independent network requests, so both succeeded.
Fix: Two-layer guard. Added optional isSubmitting?: boolean prop to IssueForm bound to the button's :disabled. Each call site got a local isSubmitting ref, an early-return guard at the top of the handler (covers synthetic requestSubmit() paths), and a try/finally around the await so the flag resets on throw.
Lesson: A shared form component is the leverage point for double-submit prevention — every consumer is one optional prop away from being protected. And the guard needs both layers: :disabled blocks the click, the early-return covers synthetic submits, and try/finally guarantees the flag resets even when the network call throws (otherwise a failed submit locks the form forever).

KD-0653 — UpdateIssueAction silently drops `attachmentIds` from PUT requests

Severity: medium
Symptom: PUT /api/projects/{id}/issues/{slug} accepted attachment_ids in the body, validated it, populated the DTO, and returned 200 OK — but UpdateIssueAction never read the field. The API advertised behaviour it didn't implement.
Root cause: SaveIssueRequest and SaveIssueData were shared across Create and Update because both controller actions used the same FormRequest. The Create path needs attachmentIds for the orphan-claim pattern (uploads happen before the issue has an ID); on Update, attachment edits go through dedicated makeAttachmentStore() endpoints, so UpdateIssueAction correctly didn't act on the field — but the shared DTO kept advertising it.
Fix: Per ADR-0020, split SaveIssueData into CreateIssueData (with attachmentIds) and UpdateIssueData (without), and SaveIssueRequest into CreateIssueRequest / UpdateIssueRequest. Update path's validation rule removed entirely. Frontend IssueBase lost attachmentIds; a NewIssueMutable type carries it as a Create-only payload.
Lesson: Sharing a FormRequest/DTO across Create and Update sounds DRY but encodes a lie when the two paths have different field surfaces — the silent-drop is the symptom, the type signature is the bug. Direction-specific DTOs make the contract honest and let the type system reject the misuse.

KD-0583 — Dead unsaved-content warning on Create Issue page

Severity: low
Symptom: The Create Issue page had a "you have unsaved files, leave anyway?" warning that had never fired in this codebase. No user had reported the missing dialog.
Root cause: onBeforeRouteLeave from vue-router requires the router to be installed via app.use(router). This app uses a custom createRouterView() shell (shared/services/router/components.ts) and never calls app.use() with the router, so the guard registered against an absent router and silently did nothing. The companion beforeunload listener only fired on tab close, not the in-app navigation case the warning was meant to cover. The author shipped without verifying the guard fired.
Fix: Deleted the dead block — onBeforeRouteLeave, the beforeunload listener, the onUnmounted cleanup, and the clearOrphanAttachments helper (no other callers). Orphan attachments are pruned server-side by PruneOrphanedAttachmentsAction after 24h, so no hygiene gap.
Lesson: Vue Router composition-API guards (onBeforeRouteLeave, onBeforeRouteUpdate) silently no-op when the router isn't installed as a plugin — apps using a custom router-view shell must verify any router-guard hook actually fires before shipping it, because the failure mode is invisible.

KD-0626 — lint-staged glob never matches, ESLint skipped on commit

Severity: low
Symptom: Pre-commit hook printed "No staged files match any configured task" and ESLint never ran locally — errors only surfaced on CI.
Root cause: lint-staged config in frontend/package.json used globs anchored at repo root (frontend/src/**/*) but the hook ran lint-staged with cwd frontend/, where staged paths resolve to src/.... Off-by-one prefix.
Fix: Stripped the frontend/ prefix from both glob keys.
Lesson: Glob patterns must be relative to the cwd of the tool that evaluates them — when a tool is launched from a subdirectory, every config path inside it is anchored there.

KD-0606 — AI generate-story keys collide with snake_case wire format

Severity: high
Symptom: "Generate" button on Reports/Issues AI panel returned 422 "The source description field is required" even though the report had a description.
Root cause: The frontend HTTP middleware runs deepSnakeKeys() on every outbound payload, but AgentGenerateStoryRequest::rules() keys were camelCase (sourceDescription). Wire body shipped source_description; rule never matched. The earlier KD-0511 rename intended snake_case but wrote camelCase. Feature tests posted camelCase directly, bypassing the middleware, so CI stayed green while production was broken.
Fix: Renamed rule keys to snake_case; added arch test rejecting any camelCase top-level rule key; updated tests to post the real wire format.
Lesson: Feature tests that bypass the global request middleware can hide wire-format mismatches indefinitely — when middleware mutates payload shape, tests must post the post-middleware shape, not the pre-middleware shape.

KD-0605 — Stale `checkedReportIds` selection promotes the wrong report

Severity: high
Symptom: After dismissing a checked report and clicking a different one, pressing Promote generated an issue from the previously checked report. UI showed report B; API received report A's id.
Root cause: selectedReportId (detail pane) and checkedReportIds (multi-select for promote) were two independent pieces of state. The set was never pruned when a report transitioned out of pending — dismissed reports stayed checked and ReportDetail.promoteReports preferred the non-empty stale set over the visible report.
Fix: Self-heal in the checkedReports computed by filtering on getReportStatus(report) === pending, so non-pending reports drop out of the multi-select reactively.
Lesson: Two pieces of selection state that model the same intent will drift — prefer derived/filtered state over manually synchronised mirrors, or self-heal in the computed by gating on the source-of-truth status.

KD-0604 — `parseDuration` silently drops decimals

Severity: medium
Symptom: Users entering "2.5h" saw it round-trip to "5h" (300 minutes) instead of 150 minutes. No error — silent corruption.
Root cause: DURATION_PATTERN = /(\d+)\s*(w|d|h|m)/gi was non-anchored and integer-only, used with matchAll. It silently skipped any character that didn't fit the pattern — decimals, commas, junk after a fragment. "2.5h" matched only 5h.
Fix: Split into VALIDATION_PATTERN (anchored, validates whole input) and EXTRACTION_PATTERN (extracts each chunk). Reject inputs that don't fully match instead of partial-summing.
Lesson: Non-anchored matchAll over user input is a silent-corruption pattern — when parsing structured input, validate the whole string against an anchored pattern before extracting parts.

KD-0601 — Dragging issue to sprint shows "unauthorized"

Severity: medium
Symptom: Users with "Own" update scope could change an issue's sprint via the edit modal but got 403 when dragging the same issue on the backlog.
Root cause: IssuePolicy::updateBoard() called CheckPermission::check() with no $ownerId, so the "Own" scope check evaluated null !== null → always false. The sibling update() method correctly passed $issue->user_id.
Fix: Pass $user->id as $ownerId in updateBoard(); additionally add per-issue Gate::authorize in UpdateIssueBoardAction for issues that actually moved (sprint/lane/epic changed).
Lesson: Policies that share a permission scope must share a calling convention — when scope semantics depend on a parameter (like $ownerId), every policy method that checks that scope must pass it the same way, or "Own" silently means "deny everyone".

KD-0600 — GitHub App install fails on webhook/redirect race

Severity: high
Symptom: Users completing GitHub App install were told "Installation failed — please close this tab and try again" while the recovery URL silently still worked. Reproduced on production for the emmie tenant.
Root cause: GitHub fires the installation webhook and the browser redirect concurrently with no ordering guarantee. The webhook controller queued ProcessInstallationWebhookJob and returned 200 immediately; the redirect's one-shot DB lookup hit before the job ran. Worse, the error copy told users to close the tab — abandoning the working recovery URL.
Fix: Process installation events inline in the webhook controller (200 only after row committed); add bounded retry ([200, 500, 1000, 1000]ms) in the lookup; rewrite Blade view to auto-reload on installation_missing instead of telling user to close the tab.
Lesson: When two external systems fire concurrent events about the same state, "process inline" + "bounded retry on read" beats "queue async + hope" — and error copy must direct users toward recovery, never away from it.

KD-0596 — Reserved subdomain blocklist not enforced on admin CRUD

Severity: high
Symptom: A central operator could create a tenant with a reserved subdomain (e.g. central.kendo.dev — the central app's own host) via admin paths, bypassing the public signup blocklist.
Root cause: Three admin FormRequests (StoreTenantRequest, StoreDomainRequest, UpdateDomainRequest) used a weaker regex than StoreSignupRequest and lacked Rule::notIn(Tenant::RESERVED_SUBDOMAINS). Validation logic was duplicated across requests with no shared source of truth, so the drift was invisible.
Fix: Extracted shared SubdomainRule ValidationRule class; applied to all four FormRequests including signup.
Lesson: Validation logic that exists in more than one place will drift — extract shared rules into reusable ValidationRule classes the moment a second copy is needed.

KD-0591 — Validation errors silently fail on 11 forms

Severity: medium
Symptom: Users submitted forms; backend returned 422 with field-level errors; nothing rendered. Forms sat there with no feedback.
Root cause: 11 templates lacked <FormError name="…" /> next to inputs whose backend rules validated those fields. The response middleware populated the global errorBag correctly — there was just no live FormError instance subscribed to render it.
Fix: Inserted 27 missing <FormError> bindings. Two server-determined fields (laneId, order) intentionally skipped.
Lesson: Hand-authored form-error placement guarantees drift — forms should structurally couple inputs with their error display (a FormField wrapper, or an arch test that cross-references templates against backend rules).

KD-0589 — Duplicate inserts return 500 instead of 422

Severity: medium
Symptom: Three Store endpoints (tenant AI key, project AI key, project GitHub repo) returned HTTP 500 when a user submitted a duplicate — typically a double-clicked Save button.
Root cause: Migrations declared unique indexes but the corresponding FormRequests had no Rule::unique(...) and the Actions had no pre-check. Duplicates reached save(), surfaced SQLSTATE 23000, and Laravel rendered that as 500.
Fix: Added Rule::unique with the migration-matching where() scope to each FormRequest.
Lesson: Every DB-level unique index needs a matching Rule::unique (or Action-level guard) — the DB invariant is correct, but without a validation surface the user gets 500 instead of a polite 422. An arch test that cross-references migration unique indexes against FormRequest rules would catch this class.

KD-0588 — `PasswordConfirmModal` silent failure

Severity: high
Symptom: Wrong password (or any error) in the password-confirm modal closed the modal silently. User believed the destructive action they were confirming had succeeded.
Root cause: handleSubmit ordered emit('close') before await onConfirm. Parent unmounted the modal during the synchronous emit, destroying the inline <FormError name="password" /> before the response middleware could populate errorBag.password. The catch block intentionally swallowed the error, relying on a FormError that no longer existed.
Fix: Reorder so emit('close') runs only after a successful await onConfirm. Distinguish 422 (FormError surfaces inline) from non-422 (dangerToast) in the catch.
Lesson: Never close a modal until the awaited action it gates has resolved — and never rely on a global error bag if the component subscribed to it might already be unmounted.

KD-0587 — Cross-project attachment leak via unscoped `attachment_ids`

Severity: medium
Symptom: Reported as a cross-project leak: a user could attach attachments from another project. Investigation showed the leak didn't actually happen at runtime (Action layer scoped the query) but the FormRequest validation gap was real defense-in-depth.
Root cause: SaveIssueRequest validated attachment_ids.* as ['integer'] only — no Rule::exists('attachments', 'id')->where('project_id', $projectId). Inconsistent with lane_id, sprint_id, epic_id etc. on the same request. The arch test that should have caught this matched only the wrong-pattern ('exists:attachments,id') and missed omission entirely.
Fix: Added scoped Rule::exists. Added 'attachments' to the arch-test whitelist.
Lesson: Defense-in-depth scoping must live at the FormRequest layer, not just the Action — and arch tests that detect misuse must also detect omission, otherwise they're a false signal.

KD-0586 — Validation errors don't reach users on 14 forms (camelCase mismatch)

Severity: medium
Symptom: Server-side 422s arrived but never displayed on 14 forms. Wrong-password failures, missing-team errors, etc. all silently failed.
Root cause: The Axios response-error middleware ran camelCase(key) on every error key before populating errorBag. 25 <FormError name="snake_case"> bindings looked up keys the middleware never populated. Existing camelCase bindings worked; new authors didn't realise the middleware was transforming.
Fix: Renamed all 25 dead bindings to camelCase. Added arch test rejecting any <FormError name> containing _ or -.
Lesson: When a middleware silently transforms data shape, the convention has to be enforced by tooling — naming conventions across template/middleware boundaries are guaranteed to drift without an arch test.

KD-0585 — `projects.description` column too short for validation rule

Severity: medium
Symptom: POST /api/projects returned HTTP 500 for any description longer than 255 chars (4 occurrences in 24h on prod).
Root cause: Migration created description as $table->string() (VARCHAR(255)). FormRequest later capped at max:5000. Validation passed, INSERT crashed with SQLSTATE 22001 Data too long. No arch test asserts that string|max:N rules don't exceed the underlying column length.
Fix: Changed column to TEXT.
Lesson: Validation rule length and column length must be cross-checked by tooling — drift between FormRequest max:N and schema length silently turns 422-able input into 500s.

KD-0581 — Billing seat-count mismatch between Kendo UI and Stripe

Severity: high
Symptom: Pro tenant displayed "Seats: 2 (€4/seat/month)" but Stripe's invoice was €4 — quantity stuck at 1. Customer silently under-billed.
Root cause: Two sources of truth with no reconciliation. BillingController::status computed seat count live from User::query()->count(). Stripe's quantity changed only when SyncSeatQuantityJob fired from three Actions (invite/delete/restore). Any membership change before the sync infrastructure existed left Stripe permanently stale. Both paths also failed silently on edge cases (no tenant context, no Cashier subscription, queue failures). Fix (proposed): Make Stripe the single source of truth — read seat count from $subscription->quantity for active subscriptions, fall back to User::query()->count() only without a subscription. Lesson: External systems holding billable state must be the single source of truth — recomputing the same metric in two places (DB count + external API) without a reconciliation pass guarantees drift, and silent no-ops in sync code make the drift invisible.

KD-0574 — `TenantAwareQueue` captures stale scoped instances, every queued broadcast dropped

Severity: high
Symptom: Every realtime broadcast dispatched through queue workers silently dropped in production. Users saw no live updates anywhere until manual reload.
Root cause: TenantAwareQueue was constructed once at boot with TenantSwitcher and TenantContext injected as readonly properties. Both bindings were scoped. Laravel's queue worker calls forgetScopedInstances() before every job — removing the cached instance from the container's instances[] map but leaving TenantAwareQueue holding an orphan reference. The JobProcessing listener wrote tenant onto the orphan; broadcastOn() resolved a fresh TenantContext with no tenant set. KD-0556's lazy resolution surfaced the bug; eager constructor injection was the underlying defect. Mocked unit tests passed because Mockery has no notion of container scoping.
Fix: Replaced eager constructor injection with resolve(...) calls inside every closure registered by register(). Replaced unit tests (which gave false-green for ~7 weeks) with feature tests using real container bindings.
Lesson: When a service's dependencies are scoped, that service must NOT cache them in constructor properties — and tests for scoped-binding consumers must use real container bindings, because mocks bypass container scoping and ship false-green.

KD-0556 — PR-merge webhook does not broadcast issue lane change

Severity: medium
Symptom: Merging a PR moved the linked issue's lane server-side but no updates broadcast reached connected boards in real time. Manual drag-drop broadcast correctly, so websocket pipeline was healthy.
Root cause: 10 broadcast events derived their channel from TenantContext snapshotted in the constructor, and broadcastOn() returned [] silently when the snapshot was null — no log, no exception. Either tenant context wasn't bound at construction time (background job, console command) or the snapshot drifted across processes. The silent-drop was invisible to monitoring.
Fix: Shared ResolvesTenantBroadcastChannel trait that resolves TenantContext lazily at broadcast time and emits a structured error-level log when the resolved id is null.
Lesson: Code paths that drop work silently are unobservable bugs — every "return empty / skip / no-op" branch on infrastructure code must log something structured, otherwise the failure mode never surfaces.

KD-0553 — GitHub App self-serve install fails on cache-prefix mismatch

Severity: high
Symptom: GitHub App tenant install fails with "Invalid or expired OAuth state" 404 on the first try. Complete feature outage.
Root cause: Install state lived in cache. The install request ran under tenant context (IdentifyTenant mutated cache.prefix to tenant_{id}_); the setup-callback ran on the base domain without IdentifyTenant. Laravel's CacheManager caches resolved stores by name — each store reads cache.prefix at construction. When PHP-FPM resolved a fresh cache.store between requests, it read the default prefix and missed the tenant-prefixed key.
Fix: Moved install state to a dedicated github_app_install_states table on the central connection. Prefix-immune by construction.
Lesson: State that crosses a tenant-context boundary cannot live in a tenant-prefixed cache — when reads and writes happen under different prefix configs, use a connection-scoped table instead.

KD-0545 — My Issues badge does not update via realtime broadcast

Severity: medium
Symptom: The Navbar's My Issues badge and page didn't react to assignment changes, lane crossings into Done, or self-assignment via MCP. Users had to refresh.
Root cause: Two-part defect. (1) Backend never fired user-scoped issue broadcasts — only project-channel events, which the Navbar can't subscribe to. (2) Frontend updates/deleted listeners on the user channel were domain-blind — every payload routed unconditionally to notificationStore, even though UserDomainUpdateEvent already carried a domain field.
Fix: Added IssueBroadcaster::myIssuesChanged() for user-scoped fan-out (computing wasOnList / isOnList per affected user). Made Navbar's user-channel listeners domain-aware.
Lesson: Realtime channels must be cut along the same axis as the data they keep in sync — a "My X" view cannot rely on per-project channels, and a multiplexed user channel needs explicit domain dispatch on the frontend or stores can't share it.

KD-0537 — Activity-timeline backfill migration deadlocks production release

Severity: high
Symptom: Fly's release_command for prod v180 failed with MySQL deadlock during a tenant-migration backfill. Production stuck on v179; every dev→main merge would keep failing the release.
Root cause: Two interacting problems. (1) Fly's release_command runs in an ephemeral machine while v179 app machines keep serving live traffic — the migration's per-row SELECT ... FOR UPDATE + INSERT fought the live IssueAuditLogger for the next-key lock; InnoDB picked the migration as the deadlock victim. (2) The backfill itself was wrong-shaped — it would have written synthetic "Created today" audit-log entries with now() timestamps, polluting the append-only hash chain with fabricated history. Staging never hit it because traffic was lower at deploy time.
Fix: Deleted the migration. The activity timeline correctly returns [] for legacy issues with no audit history; the frontend already had an empty state for that case.
Lesson: release_command migrations run concurrently with the previous version's live writes — any backfill that contends with hot-path writes on the same index will deadlock. And: if the truthful response to "we have no data" is an empty array, don't fabricate data to make it look populated.

KD-0519 — Lane reorder chevrons silently no-op on Project Settings

Severity: medium
Symptom: Clicking up/down chevron next to a lane appeared to do nothing — the order didn't change visually. (DB was actually updated; the frontend just didn't refresh.)
Root cause: Earlier KD-0464 split into two commits. Frontend assumed broadcasts would keep the lane store in sync and removed laneStore.retrieveAll() from updateLaneOrder(). Backend explicitly excluded bulk/cascade lane mutations from broadcasting — and lane reorder happens inside UpdateProjectAction's loop, which is exactly that "bulk/cascade" bucket. Net: write happened, no broadcast, no refetch, store kept stale order values.
Fix: Restored the one-line laneStore.retrieveAll() after project.update().
Lesson: When two commits together replace a refetch with a broadcast, both halves must cover the same code paths — broadcaster scope decisions on the backend must be cross-checked against every refetch the frontend removed.

KD-0518 — Sprint title update wrongly requires `status`

Severity: low
Symptom: Reporter claimed sprint edit modal returned 422 "status field required". Investigation showed real UI usage round-trips status correctly via mutable.value; only hand-crafted partial-payload clients (curl, MCP, external API) hit the 422.
Root cause: No defect in the real UI flow. The contract is "send the full sprint shape on update" — the adapter-store does that; clients that craft partial payloads correctly fail validation.
Fix: Kept status as required. Removed regression tests that had been added during investigation that encoded behaviour contradicting the chosen contract.
Lesson: Reproduce the bug from the actual UI flow before changing the contract — a report describing partial-payload behaviour might be from a hand-crafted client and reflect the contract working as designed.

KD-0514 — "Added you to project" notification fires for existing members

Severity: medium
Symptom: Users already on a project got an "added you to project X" notification when a new team containing them was linked.
Root cause: UpdateProjectAction built recipients by array_unique(array_merge($newlyAddedDirectMemberIds, $newTeamMemberIds)). The team-side list was every member of every newly-attached team with no diff against existing project membership. array_unique only deduplicated between the two lists — it didn't subtract users who already had access.
Fix: Subtract $currentDirectMemberIds ∪ members-of-currently-attached-teams from $allNewMemberIds before notifying.
Lesson: Notifications about "newly added" must diff against the prior state — set-union dedup is not the same as diff. When access can be granted via multiple paths (direct + team), every path must be considered when computing "what changed".

KD-0511 — AI validation error leaks into form fields

Severity: medium
Symptom: Clicking Generate on the AI story prompt with a short report description showed a 422 error attached to the IssueForm's description textarea — a field the user never edited.
Root cause: Wire-level field-name collision. The AI endpoint and the IssueForm shared two field names (description, title). The global error bag was keyed only by Laravel field name with no per-form scoping. A 422 keyed description from the AI endpoint rendered under any <FormError name="description">.
Fix: Renamed AI endpoint payload keys to sourceDescription/sourceTitle so no <FormError> watches them.
Lesson: A globally-scoped error bag means wire field names are a global namespace — two forms sharing a field name will leak errors across each other. Either scope error bags per form, or use distinct wire-field names for distinct forms.

KD-0510 — Newly created report not auto-selected in detail pane

Severity: low
Symptom: Submitting "+ Report" appended the new report to the list but the right-hand detail pane stayed on the placeholder. User had to find and click the new entry.
Root cause: handleCreate called await newReport.create() but discarded the return value. The adapter resolved to the persisted Report with its server-assigned id; nothing assigned that id to selectedReportId.
Fix: Capture the returned Report and assign its id to selectedReportId.
Lesson: Async create flows must thread the persisted entity's id back through the UI — otherwise the verify-and-edit loop is broken into two disconnected steps.

Severity: low
Symptom: The X close button in modals was invisible to keyboard tools (Vimium) and screen readers.
Root cause: The X was a bare <svg> with a @click handler. SVG is not in the default tab order, has no implicit ARIA role, and is invisible to browser-level focus management.
Fix: Wrapped the icon in <button type="button" aria-label="Close"> with focus-visible ring.
Lesson: Click handlers belong on semantic elements (<button>, <a>) — never on bare icons. Accessibility regressions of this class are best caught by a structural arch rule, not visual review.

KD-0500 — Epic name overflow on board cards

Severity: low
Symptom: Long epic titles overflowed the colored badge horizontally past the card's right edge into adjacent space.
Root cause: Three compounding causes. (1) Project doesn't import @unocss/reset, so elements default to content-box — max-width: 100% constrained only content, padding+border were added on top. (2) min-width: auto on inline-block elements with nowrap resolves to full unwrapped text width, beating max-width. (3) text-nowrap prevented wrapping but didn't add overflow:hidden or ellipsis.
Fix: Combined box-border + min-w-0 + max-w-full + truncate on SimpleBadge. Added min-w-0 on the parent flex container.
Lesson: Without a global box-sizing: border-box reset, every component that uses padding/border with percentage max-width is overflow-prone — and min-width: auto on flex/inline-block items is the silent killer that makes max-width constraints useless.

KD-0496 — Manual reports show 'Unknown' as author

Severity: low
Symptom: Manual reports created through the UI showed "Unknown" as author. API-created reports were fine.
Root cause: ReportForm.vue didn't send author_name; CreateReportAction stored null; frontend templates fell back to literal 'Unknown'. The single write site ($report->author_name = $data->authorName) was the bug — every read path correctly read the column, but the column was never populated for manual reports.
Fix: Write-time fallback in the Action: $data->authorName ?? "{first_name} {last_name}" when a creator is present.
Lesson: When many read paths share a single column, fix at the write site — anything else means hunting through every consumer (HTTP Resource, MCP tools, frontend templates) to patch the same fallback.

KD-0484 — `PaginationBar` overflows narrow containers

Severity: low
Symptom: Pagination bar visibly overflowed the Reports Overview's 400px left column when there were 8+ pages.
Root cause: <nav> had flex with no flex-wrap and used 11 fixed-width w-10 buttons (~440px intrinsic). Flex items don't shrink below their content's intrinsic width without explicit min-w-0. Parent had flex-wrap so siblings could wrap relative to each other, but neither child could wrap internally.
Fix: Added flex-wrap as last-resort fallback plus a CSS container query that hides the redundant «/» shortcut buttons below 440px (the edge pages remain reachable via the existing first/last page-number buttons).
Lesson: Container queries are the right tool for "compact this UI when its container is narrow" — viewport media queries can't see how big the actual parent column is.

KD-0445 — Ticket-updated toast cluttered, auto-hides

Severity: low
Symptom: When user A updated an issue, user B got an infoToast reading "Alert: <title> updated by <actor> at <ISO timestamp>" that auto-hid after 5s.
Root cause: Three coupled gaps. (1) Show.vue (issue detail page) didn't subscribe to project-channel issue-update broadcasts like Board/Backlog/Overview did. (2) Backend papered over (1) with a global PrivateAnnouncement user-channel toast. (3) The toast variant had no persistence escape-hatch — auto-hid before the user could act.
Fix: Removed the entire PrivateAnnouncement → 'alerts' → infoToast plumbing. Realtime "your view is stale" UX should live on the page that goes stale, not as a global cross-page toast.
Lesson: When a global notification papers over a missing local realtime subscription, the right fix is to wire the local subscription — not to refine the global notification.

KD-0443 — Board layout glitches on phone in landscape

Severity: low
Symptom: On phone in landscape, the sidebar took ~30% of the width and avatars overflowed issue cards.
Root cause: Two issues. (1) Breakpoint service used window.innerWidth < 768 for mobile detection — phone in landscape often has 800-900px width, crossing the threshold and rendering the desktop sidebar. Viewport width is not a reliable proxy for device type. (2) DragElement.vue had no overflow-hidden constraint, so children escaped at narrow column widths.
Fix: Added isTouchDevice via CSS media query (pointer: coarse) and (hover: none) (correctly identifies phones/tablets without external input devices). Forced collapsed sidebar on touch devices. Added overflow-hidden + truncation to card.
Lesson: For "is this a phone" questions, ask CSS about input modality (pointer: coarse, hover: none) — never use viewport width as a proxy. Phones in landscape break that proxy.

Severity: low
Symptom: Toasts fired while a modal was open rendered beneath the modal's backdrop and were invisible to the user.
Root cause: Modals use native <dialog>.showModal(), which adds the dialog to the browser's top layer — a separate rendering stack that paints above every regular z-index context. The toast container was a fixed <div> at z-index: 1050 in the regular stacking context. Top layer always wins.
Fix: Upstream in @script-development/fs-toast@0.2.0 — added popover="manual" to the container <div> and showPopover() on mount. Re-enters the top layer on every new toast (last-in-wins ordering).
Lesson: The browser's top-layer is not a higher z-index — it's a parallel rendering stack. Anything that must paint above a native <dialog> must also live in the top layer (via Popover API), not just have a high z-index.

Recurring themes

Silent failures are the real bug. KD-0556, KD-0588, KD-0511, KD-0586, KD-0591, KD-0581, KD-0604 — every "no error, no toast, nothing rendered" symptom traces to a code path that swallows or drops without logging. Every "return empty / skip / no-op" branch on infrastructure code needs a structured log entry, or the bug is unobservable.
Form-error binding edge cases. KD-0586 (camelCase mismatch), KD-0591 (missing bindings), KD-0511 (cross-form key collision), KD-0588 (modal unmounted before error rendered), KD-0606 (snake/camel rule mismatch). The <FormError> + global error bag pattern keeps producing the same shape: any drift between the wire shape, the middleware transform, the rule key, or the template binding silently breaks the user feedback loop. Arch tests catch the structural class; component-level patterns (FormField wrappers, scoped error bags) prevent it.
Tenant-scoping leaks via the wrong infrastructure. KD-0553 (cache prefix crossing tenant boundary), KD-0574 (scoped binding captured at boot), KD-0556/KD-0537 (broadcast/migration assumes tenant context). State that crosses tenant boundaries must live somewhere prefix-immune (central connection table, lazy resolution at consumption time) — caching it under a tenant prefix or capturing scoped instances at boot is a guaranteed silent drop.
Validation drift between layers. KD-0585 (max:5000 vs VARCHAR(255)), KD-0589 (unique index vs no Rule::unique), KD-0596 (signup blocklist vs admin paths), KD-0587 (scoped exists missing on attachments). Whenever a constraint exists at one layer (DB, migration, central rule) but isn't mirrored at the layer the user hits first (FormRequest), the error surfaces as a 500 or a silent leak. Arch tests that cross-reference layers (column length vs rule max, migration unique vs FormRequest unique, project-owned tables vs scoped exists) close the entire class.
Single source of truth, or guaranteed drift. KD-0581 (UI count vs Stripe quantity), KD-0596 (4 copies of subdomain rule), KD-0586 (manual <FormError> placement vs middleware naming), KD-0510 (server response not threaded to UI state). Anywhere the same value is computed/stored/displayed in two places without a reconciliation pass, drift is a question of when, not if.
Broadcast/refetch coverage gaps. KD-0519 (frontend dropped refetch assuming broadcast covered it), KD-0545 (project channel doesn't cover My-X views), KD-0556 (silent broadcast drop), KD-0445 (page didn't subscribe at all). Realtime channels must be cut along the same axis as the views consuming them. A user-list view cannot rely on per-project channels; a per-page view cannot rely on a global toast as compensation for a missing subscription.
Misleading error UX directs users away from recovery. KD-0600 ("close this tab" while the URL silently still worked), KD-0588 (modal closed before error rendered), KD-0606 ("source description required" with no input by that name), KD-0998 (fatal error boundary eats a form full of typed input on a 422), KD-0931 (generic "try again" when the actual remedy is one settings page away). Error copy must reference the user's actual recovery path — never tell users to abandon a working state, never reference field names the UI doesn't show, and never let a recoverable validation failure destroy the thing the user was working on.
Check-then-act and last-write-wins races. KD-1025 (guard read in memory, outside the transaction, before slow side-effecting work — two promotes both win), KD-1001 (payload built from state that only advances on response, so two rapid clicks each carry the other's stale value), KD-1015 (full-state PUT from a per-session snapshot, so two sessions clobber each other), KD-0911 (async setup Suspense doesn't cancel, so a stale component renders against the new route), KD-0882 (un-awaited dynamic import races test-environment teardown). The shape is always the same: a read and the write that depends on it are separated by a window in which something else can commit. The fixes divide cleanly by layer — atomicity at the DB (lockForUpdate() + re-assert under the lock), serialization at the client (promise queue), and partial-update contracts at the wire so a request never asserts fields it didn't change. Note the escalation pattern: fixing the client-side race in KD-1001 could not close the cross-session case, because the real defect was the contract, not the caller. Testing this class needs deliberate suspension — hold the first request open with a never-resolving promise, or mock the locked re-fetch to return the winner's value. A test that awaits each call in turn cannot express a race at all.
Caching something whose identity never changes. KD-0835 (SPA shell cached while naming hashed chunks the next deploy deletes), KD-0953 (avatar URL keyed only on user id, so a reactive src never changes and the browser never asks), KD-0920/KD-0919/KD-0928 (invalidation signal stamped but stripped by CORS, then stamped on too few routes to arrive). Two rules fall out. First, a cache key must change when the content changes — content-addressable or version-tokened URLs, never a stable URL for mutable bytes; cache headers cannot help when the client never issues a request. Second, the shell that names content-addressable assets must never itself be cached, and the recovery handler for a stale bundle cannot live inside that bundle.
Tests that pin the bug instead of catching it. KD-0985 (spec asserted the leaked variable name, converting a typo into a specification), KD-0886 (spec asserted the over-gated button was absent), KD-1050 (spec forced descriptionOverflows = true, skipping the measurement that could never produce it), KD-0997 (single blanket mockReturnValue on the permission check made "has one permission but not the other" inexpressible — the exact gap that let two broken fixes ship), KD-0878 (spec pinned the pre-migration payload shape while prod threw). A test written from the implementation rather than the requirement records current behaviour and defends it. The tells are consistent: asserting a literal copied out of the source, forcing the state whose derivation is the thing under test, and mocking a decision function wholesale when the bug lives in the disagreement between two of its answers.

Postmortems ​

KD-1050 — ReportCard clamp gated on a flag only the clamp itself could set ​

KD-1049 — Reverb launched with no preflight for the PHP redis extension ​

KD-1048 — notifications.message VARCHAR(255) too short for an interpolated issue title ​

KD-1047 — Pipeline stepper keeps spinning after story generation fails ​

KD-1045 — .env.example promised a TENANT_ADMIN_DB_* fallback the config deliberately refuses ​

KD-1028 — Tiptap serializes nested lists at 2 spaces; marked needs 3+ ​

KD-1025 — Promote Actions check "already done?" outside the transaction (TOCTOU) ​

KD-1015 — Full-payload preference PUT lets two sessions clobber each other ​

KD-1005 — EpicBoard scroll-anchor test rides the real clock, red for one day a year ​

KD-1002 — Markdown links open in the same tab, unloading the SPA ​

KD-1001 — Rapid notification toggles silently revert each other ​

KD-0999 — Raw "Unauthenticated." toast shown to guests on the login page ​

KD-0998 — Issue-create form loses typed input on an empty-title submit ​

KD-0997 — Token CTA gated on the wrong permission, three times running ​

KD-0994 — splice(index) with no deleteCount wipes the whole modal stack ​

KD-0993 — Promote broadcast excludes the promoting user's own connection ​

KD-0991 — items-start shrink-to-fits the Settings card below lg, clipping controls off-screen ​

KD-0985 — Pagination bar prints the computed-ref variable name as the entity label ​

KD-0977 — Arch test re-resolve()s an already-normalized path, false-positiving on Windows ​

KD-0953 — Avatar URL never changes, so the browser never refetches after upload ​

KD-0951 — Attachment grid scrolls horizontally behind an auto-hiding scrollbar ​

KD-0931 — AI-key-missing failure collapses into a generic "try again" ​

KD-0928 — Lane reorder invisible until refresh: the cache-hash protocol never reached the client ​

KD-0911 — Interrupted navigation renders a stale ProjectLayout, throwing on a missing route param ​

KD-0886 — Branch unlink hidden behind a GitHub OAuth check it doesn't need ​

KD-0859 — Resend-verification keyed on tenant.database as a proxy for the domain ​

KD-0835 — Cached SPA shell references chunk hashes the deploy already replaced ​

KD-0794 — composer test OOMs at 512M because the whole suite runs in one process ​

KD-0924 — Subdomain availability check ignores the domains table ​

KD-0920 — x-fs-cache-hashes header not CORS-exposed, killing cross-origin cache invalidation ​

KD-0919 — Cache-hash header stamped on too few routes to ever reach the SPA ​

KD-0918 — Memoized cached stores go deaf to broadcasts after the first page unmount ​

KD-0889 — FilterBar search input has no inline clear (✕) button ​

KD-0882 — ProfileSidebar spec leaks a post-teardown dynamic import, flaking CI ​

KD-0878 — Epic Board badge stops updating on remote drags (stale positions payload shape) ​

KD-0872 — Bearer-token auth failures return a 302 redirect instead of 401 JSON ​

KD-0870 — project_tokens.active not synced when the backing PAT is revoked ​

KD-0858 — Board card can't be moved: deterministic rank collision dead-ends MoveIssueAction ​

KD-0852 — My Issues badge shows a stale lane after an agent (MCP) lane change ​

KD-0848 — Filter bar hijacks Cmd/Ctrl+F, blocking the browser's native find ​

KD-0845 — Comment-editor links have no underline and navigate on click ​

KD-0841 — Logging time for a past date takes too many interactions ​

KD-0839 — PDF attachment preview renders blank (iframe sandbox blocks JS) ​

KD-0838 — AsyncErrorBoundary shows a fatal "Could not load page" for transient/user-action failures ​

KD-0837 — Bar Color picker dims unselected swatches, distorting their hue ​

KD-0836 — Time-log summary cards bucket by logging date, not work date ​

KD-0817 — Issue deletion fails on FK constraint for Hand-to-Claude tables ​

KD-0757 — Mention menu opens at page far-left on the first @-keystroke ​

KD-0752 — Markdown (.md) attachments have no in-app preview ​

KD-0644 — Empty toast container stays rendered on every page after toasts dismiss ​

KD-0624 — CascadeRelationsTest skips Tenant and misses trait-provided relations ​

KD-0479 — Tooltip layout regressions and silent failures ​

KD-0807 — Multi-select drag in Backlog persists only one issue's move ​

KD-0798 — VS Code extension shows no issues after API shape change ​

KD-0788 — Central-binding arch test over-detects: only 5 of 14 flagged Actions were true gaps ​

KD-0787 — RollbackProvisioningAction unbound + inline DROP DATABASE on the bound connection ​

KD-0786 — ProvisionDomainAction unbound from central despite being central-only ​

KD-0785 — CreateTenantAction crashes on first prod central-admin invite ​

KD-0783 — Public signup 500s in prod on audit-log transaction assertion ​

KD-0761 — Avatar initials low-contrast, undersized, off-center ​

KD-0760 — Delete confirmation dialog shows "Submit" instead of "Delete" ​

KD-0759 — File-upload drop zones too short to hit reliably ​

KD-0756 — Invite form not reset after a successful invite ​

KD-0754 — Tab walks through every WYSIWYG toolbar button ​

KD-0753 — LinkBranchTool rethrows raw exceptions as JSON-RPC -32603 ​

KD-0738 — Project deletion 500s on unhandled RESTRICT foreign keys ​

KD-0734 — Complete Sprint modal prompts for incomplete issues when none exist ​

KD-0733 — Markdown tables render as unstyled plain text ​

KD-0725 — Modal dialogs overflow viewport on narrow screens ​

KD-0700 — Hand-to-Claude grader verdict read from a key Anthropic never sends ​

KD-0699 — PR-evidence parser rejects every verbatim MCP response ​

KD-0693 — Anthropic session cleanup 400s archiving the primary thread ​

KD-0691 — session.status_idled webhook events silently dropped ​

KD-0634 — Filter state leaks across projects, blanking Backlog/Board ​

KD-0631 — Blank page when async component setup fails ​

KD-0512 — Reports detail pane cramped at tablet / mid-desktop widths ​

KD-0687 — Implementer agent silently reverts to always_ask on every re-run ​

KD-0663 — Issue show page does not update from broadcasts ​

KD-0654 — IssueForm submit button not disabled during in-flight save ​