Appearance
Postmortems
Production-affecting bugs and edge cases that taught us something. Newest first.
The fix lives in code; what's preserved here is the root cause and the generalizable lesson — the part that disappears if you only read commits.
KD-0924 — Subdomain availability check ignores the domains table
- Severity: medium
- Symptom:
CheckSubdomainAvailabilityActionreported a subdomain as available when it existed indomains.domainbut not intenants.database. The subsequent signup insert then 500'd on thedomains_domain_uniqueconstraint. Reproduced in prod. - Root cause: The action only queried
tenants.database, butDomainis the canonical owner of the subdomain (the unique index lives ondomains.domain). Whentenants.databasediverged fromdomains.domain— possible after a rolled-back/diverged provisioning run or operator edit — the check passed falsely while the insert collided. - Fix: Injected
DomainalongsideTenant;execute()short-circuits toavailable: falseon atenants.databasehit, then also checksdomains.domain. - Lesson: An availability check must consult every table that owns the uniqueness invariant it's predicting — checking one of two tables that can diverge guarantees a false "available" the moment they drift. The DB unique index is the real contract; the pre-check has to query the same column(s) the index covers.
KD-0920 — x-fs-cache-hashes header not CORS-exposed, killing cross-origin cache invalidation
- Severity: medium
- Symptom: On any cross-origin setup (every local dev install:
script.localhost:3000→:8000), the browser never handed thex-fs-cache-hashesresponse header to JS, so the cached-store wrapper saw no invalidation signal and lanes/labels/sprints stayed stale until a full refresh. Invisible failure — the wrapper degrades tonullsilently. - Root cause:
config/cors.phpset'exposed_headers' => []. Browsers expose only the seven CORS-safelisted response headers to cross-origin JS; a custom header is readable only if the server lists it inAccess-Control-Expose-Headers, which Laravel'sHandleCorsemits only whenexposed_headersis non-empty. Backend stamping and the SPA wrapper were each correct in isolation — the bug was the missing exposure entry between them. - Fix: Added
'x-fs-cache-hashes'toexposed_headers(a config constant, not an env knob — the header is a fixed non-sensitive protocol value). - Lesson: Stamping a custom response header does nothing for cross-origin JS unless the server also CORS-exposes it — the two are separate steps and a header present on the wire is still invisible to
headers.get()without the expose list. Same-origin prod hid it; the first witness was every dev install. When a protocol depends on a custom header, the CORS expose entry is load-bearing, not optional.
KD-0919 — Cache-hash header stamped on too few routes to ever reach the SPA
- Severity: medium
- Symptom: The cached-store protocol's steady-state invalidation never fired during normal navigation. Client A mutated a sprint/epic/lane/label, the backend bumped the project's
*_hash, but an open tab on client B kept serving the stale list — the refetch signal never arrived. - Root cause: Registration coverage, not logic.
StampCacheHashesMiddlewarewas mounted on only five narrow routes (project show + the four cached-resource groups). The requests an SPA actually fires while navigating (board, backlog, issue show, comments) were in none of those groups, so the response carried no header. The signal was circular: the only responses announcing "sprints changed" were the sprint requests the wrapper had already decided to suppress. - Fix: Hoisted the middleware to the
Route::prefix('projects')group (one registration covers every project-scoped route, current and future) and removed the five redundant inline mounts. The middleware already self-guardsindex/storeto header-free. - Lesson: A change-notification header is only useful on the responses the client actually requests in steady state — stamping it solely on the resource's own endpoints is circular, because those are exactly the requests the cache suppresses. Mount the signal across the whole navigation surface (group-level), and lean on the middleware's self-guards rather than narrow per-route registration that drifts as routes are added.
KD-0918 — Memoized cached stores go deaf to broadcasts after the first page unmount
- Severity: high
- Symptom: Sprints/epics/lanes/labels created or changed by another client stopped appearing live after the user's first in-app navigation — they surfaced only after a full refresh. Per-page live data (board, comments, time entries) kept updating fine; only the four project-scoped cached stores went deaf.
- Root cause:
subscribeWithAutoCleanupunconditionally calledonScopeDispose(stop)(correct for per-page subscriptions). But the four cached stores are memoized per project and subscribe exactly once, at store-creation time — which runs inside thesetup()of whichever component first calls themake…Storefactory. When that component unmounted, the scope disposed,stop()fired, and the listener was removed; the memoized store stayed cached but never re-subscribed (fs-adapter-storesubscribes once, at construction). Introduced by KD-0680'sonScopeDisposeauto-cleanup, which optimised per-page teardown without accounting for the memoized-singleton lifetime. - Fix: Persistent subscribe + evict-on-leave: a scope-free
subscribeProjectChannelPersistentplus a per-projectonLeaveProjectregistry. The four stores subscribe persistently and register an eviction callback that drops their memoized instance onleaveAllProjectChannels/resetEcho, so a revisit rebuilds and re-subscribes. Per-page subscriptions keep scope-bound teardown. - Lesson: A subscription's lifetime must match the lifetime of the thing it feeds — scope-bound (
onScopeDispose) cleanup is correct for per-component data but wrong for a memoized singleton whose listener should live as long as the cache entry. When you add an auto-cleanup optimisation, audit every consumer whose lifetime is not the mounting component's, or the optimisation silently kills long-lived subscriptions on the first unmount.
KD-0889 — FilterBar search input has no inline clear (✕) button
- Severity: low
- Symptom: The issues-tab filter-bar search input had no inline ✕ to clear the term — users had to select-and-delete or use the separate global "Clear all." The reports page's search already had one, so the two bars felt inconsistent.
- Root cause:
FilterBar.vue's search<input>was bound to the model with no per-input clear control. The inline-clear pattern existed only in the siblingSearchFilter.vueand was never carried intoFilterBar. - Fix: Added a
v-if="searchTerm"✕ button mirroringSearchFilter's clear control; clicking empties the model, which both hides the button and clears the filter (the search term is the filter). Lands across all 9 pages that mount the shared bar. - Lesson: When two sibling components present the same affordance, a pattern added to one but not the other reads as a regression — shared UI affordances should be lifted to the shared component or mirrored deliberately, not implemented per-page.
KD-0882 — ProfileSidebar spec leaks a post-teardown dynamic import, flaking CI
- Severity: low
- Symptom: The
Test tenant-corefrontend CI job intermittently exited code 1 even though all assertions passed, blocking PRs until a manual rerun. The failure was an unhandledEnvironmentTeardownError. - Root cause: The spec called the
defineAsyncComponentfactory (() => import('ProfilePictureForm.vue'), which transitively pulls the browser-onlybrowser-image-compressionmodule) withoutawaiting the returned promise. The dynamic import raced Vitest's environment teardown; when teardown won, the late-resolving import was recorded as an unhandled rejection. - Fix:
awaitthe factory call so the import resolves within the test's lifetime. - Lesson: An un-awaited promise in a test — especially a dynamic
import()— races the runner's environment teardown and surfaces as an intermittent, assertion-passing CI failure. Any async work a test triggers must complete before the test ends, or it leaks into teardown as a flake.
KD-0878 — Epic Board badge stops updating on remote drags (stale positions payload shape)
- Severity: medium
- Symptom: The Epic Board's per-epic open/closed badge stopped updating live when other users moved issues; it stayed stale until a full reload. The three issue tabs handled the same broadcast correctly.
- Root cause: Since KD-0789 the backend
IssuePositionEventbroadcasts a single{position}payload on thepositionsevent. The Epic Board's handler still destructured the pre-KD-0789 array shape{positions}and looped it, sopositionswasundefinedandfor (const position of positions)threw before any state applied. The issue tabs had been migrated to the single-{position}shape (centralised inuseIssueLiveSync); this call site was missed because the Epic Board hand-rolled the same five-event protocol inline instead of consuming the composable. The spec hid it by pinning the old array shape. - Fix: Adopted
useIssueLiveSyncon the Epic Board (widened generic over item type), replacing the five inline Echo registrations with one composable call so the broadcast contract lives in exactly one place. - Lesson: A wire-shape change ripples into every consumer that re-implements the protocol by hand — the call site that hand-rolls what a shared composable already does is the one that gets missed when the shape changes. Centralising the contract in one composable is the fix and the prevention. And a test that pins the old shape keeps a broken consumer green while prod throws.
KD-0872 — Bearer-token auth failures return a 302 redirect instead of 401 JSON
- Severity: medium
- Symptom: API/MCP token clients (CLI, VS Code extension, feedback button) carrying a revoked/expired/unknown Bearer token got a
302redirect to the login page instead of a machine-readable401. The client had no actionable signal to re-authenticate — failures were silent. This was the delivery path for the 2026-05-29 feedback-loss incident (the balloon followed the redirect into a fake 201). - Root cause: Passport's
TokenGuardcatches theOAuthServerExceptioninternally and returnsnull, soauth:sanctumthrowsAuthenticationException. Laravel's default renderer checks$request->expectsJson(); token clients don't sendAccept: application/json, so the check fails and the request is redirected — correct for browser navigation, wrong for machine clients on/api/*and/mcp/*. (The issue's premise thatOAuthServerExceptionreaches the handler was wrong; the guard swallows it.) - Fix: Added a
render()forAuthenticationExceptionreturning{code: 'TOKEN_INVALID', ...}401 JSON forapi/*/mcp/*paths that don't already expect JSON; passes through (returnsnull) otherwise so SPA/JSON-client behaviour is unchanged. Mirrors KD-0739'sAccessDeniedHttpExceptionexpectsJson()scoping. - Lesson: The framework default of "redirect non-JSON auth failures to login" is correct only for browsers — machine clients on API/MCP paths need a structured 401, and they don't send
Accept: application/json. Path-scope the override (api/*/mcp/*+!expectsJson()) so browser flows keep redirecting. Diagnose where the exception is actually thrown, not where the issue claims (the guard caught the OAuth exception two layers up).
KD-0870 — project_tokens.active not synced when the backing PAT is revoked
- Severity: medium
- Symptom: Bulk-revoking a user's personal access tokens (e.g. on 2FA enablement) flipped
oauth_access_tokens.revoked = 1but left the matchingproject_tokens.active = true. The Project Tokens UI readsactive, so it kept showing dead tokens as live; operators couldn't tell a working token from a revoked one without querying the DB. Confirmed in prod after the 2026-05-29 sweep. - Root cause:
RevokeUserTokensActionhad no dependency onProjectToken— it knew onlyoauth_access_tokens. The other revocation path,DeleteProjectTokenAction, kept both tables in sync; only this path desynced, because the two paths were written independently and only the delete path was built for cross-table consistency. - Fix: Injected
ProjectToken; after the revocation loop, setactive = falseon anyproject_tokensrows whosetoken_idis in the revoked set, inside the same transaction. - Lesson: When two code paths both invalidate the same entity, both must maintain every derived/mirrored column the entity owns — one path keeping a denormalised flag in sync while the sibling forgets it guarantees the UI shows a state that contradicts the source of truth. New invalidation paths must be checked against the full set of side-effects the canonical path performs.
KD-0858 — Board card can't be moved: deterministic rank collision dead-ends MoveIssueAction
- Severity: high
- Symptom: Certain board cards refused to move — every drag snapped back, no matter how many retries. The backend returned 409 (
RankCollisionException) and the FE reverted. Surfaced in prod (Nightwatch #168). Two coupled defects: the 409 toast showed an empty body ({"message":""}), and the user-facing copy was hardcoded per-HTTP-status in the FE rather than coming from the backend exception. - Root cause:
Rank::betweenis fully deterministic (base-26 midpoint, no randomness). When a project's rank space is degraded (zero-width gap, stale neighbour ids) so the midpoint lands on an existing(project_id, rank)UNIQUE value, the 3-attempt retry loop recomputes the same value every time and collides identically — the loop only resolves transient concurrent collisions, never deterministic ones. KD-0808's respread recovery was wired inexecute()forRankTooLongExceptiononly, so overflow self-healed but a stuck gap dead-ended. The empty body:CustomExceptionsubclasses declare copy via aprotected $messagedefault, but PHP'sException::__constructoverwrites that default with''the instant it runs with any argument — includingprevious:only — and the leak-safenew X(previous: $cause)throw (mandated to avoid leaking the MySQL duplicate-key string) was therefore silently incompatible with the property-default-message pattern. - Fix: (1) Widened the reactive-recovery catch in
MoveIssueActionandBulkMoveActiontoRankTooLongException | RankCollisionException→ respread + one bounded retry. (2) GaveCustomExceptiona constructor falling back to$this->messagewhen no explicit message is passed. (3)dontReportWhenfilter dropping handled<500CustomExceptions from the monitor. (4) FE renders the backend exception'sdata.messageinstead of a hardcoded status→string map. - Lesson: A retry loop only helps when the inputs change between attempts — retrying a deterministic computation reproduces the same collision forever; deterministic exhaustion needs a state-changing recovery (respread), not a retry. Separately: PHP's
Exception::__constructclobbers a subclass'sprotected $messagedefault to''whenever called with any named/positional argument, so aprevious:-only throw silently ships an empty message — override the constructor to restore the declared default. And the backend that owns what failed should own the words the user sees; duplicating copy in the SPA per status code drifts.
KD-0852 — My Issues badge shows a stale lane after an agent (MCP) lane change
- Severity: medium
- Symptom: When an agent moved an issue to a different lane via the MCP path (
UpdateIssueTool, orstart-work-on-issue), the My Issues page kept showing the old lane in its Status badge until a manual refresh. The broadcast fired and the row otherwise updated — onlylane_title/lane_colorwere stale. The same move through the web UI updated correctly. - Root cause:
UpdateIssueActionassigns the newlane_id, saves, then broadcasts via resources that readlane_title/lane_coloroff the in-memorylanerelation, hydrating withloadMissing.UpdateIssueToolresolves the issue withlanealready eager-loaded. Changing thelane_idFK does not refresh an already-loadedbelongsTorelation, andloadMissingis a no-op when the (now stale) relation is present — so the payload carried the newlane_idbut the oldlane_title/lane_color. The web path route-model-binds without preloadinglane, soloadMissingfetched it fresh — hence agent-specific. - Fix: Added a single
$issue->refresh()after the audit-log write and before the two broadcast calls, dropping stale relation caches so the resources re-read the new lane. Rejected per-relationunsetRelation()(mock churn, only covers listed relations) and tool-level eager-load removal (leaves the root cause for other callers). - Lesson: Mutating a foreign key does not refresh an already-loaded
belongsTorelation, andloadMissingwon't re-fetch what's already (stalely) present — so a serializer that reads through the relation emits stale nested data whenever a caller preloaded it. A broadcasting Action that mutates FKs mustrefresh()(or unset the affected relations) before serializing. The bug is caller-dependent: it only appears for the path that preloads, which is why the web UI looked fine while the agent path broke.
KD-0848 — Filter bar hijacks Cmd/Ctrl+F, blocking the browser's native find
- Severity: low
- Symptom: On any page rendering the shared
FilterBar(9 call sites), pressing Cmd/Ctrl+F opened the Kendo filter popover instead of the browser's native find-in-page. The handler calledpreventDefault(), so users lost the universal browser find shortcut everywhere the bar appeared. - Root cause:
FilterBar.vue'sisFilterShortcutmatched(metaKey || ctrlKey) && key === 'f'and the keydown handlerpreventDefault()'d before opening the popover. Cmd/Ctrl+F is the browser's universal find accelerator. - Fix: Rebound the shortcut to a bare
/(the web-standard search key), freeing Cmd/Ctrl+F to fall through untouched. The existing form-control guard already suppresses it while typing. - Lesson: Don't bind app shortcuts to the browser's reserved accelerators (Cmd/Ctrl+F/N/T/W…) —
preventDefault'ing them strips a universal capability on every page the component mounts. The bare/is the conventional in-app find/search key; reserved-modifier combos belong to the browser.
KD-0845 — Comment-editor links have no underline and navigate on click
- Severity: low
- Symptom: In the
RichTextAreacomment/description editor, links rendered with no underline and clicking one navigated away instead of letting the user select/edit it. Read-only rendered prose was correct. - Root cause: Two config/styling gaps. TipTap v3
StarterKitbundlesextension-link, whose Link mark defaults toopenOnClick: true—RichTextArearegistered StarterKit with no config. Separately, no.tiptap aCSS rule existed (editor content styles live inmarkdown.css), while read-only prose was underlined via.description-prose a. - Fix:
StarterKit.configure({link: {openOnClick: false}})so editor clicks edit rather than navigate; added a.tiptap arule mirroring.description-prose a. CSS-only, not a linkHTMLAttributesclass, so the rendered<a>markup is unchanged and the existing real-render assertion stayed green. - Lesson: A bundled extension's defaults are inherited silently when you register the kit with no config — TipTap StarterKit's Link defaults
openOnClick: true. And the editor surface (.tiptap) and read-only surface (.description-prose) are separate style scopes; a prose rule does not cover the editor. Check both the behaviour defaults and the per-surface CSS coverage.
KD-0841 — Logging time for a past date takes too many interactions
- Severity: low
- Symptom: The "Log Time" modal's "Started At" field opened empty (a bare native
<input type="datetime-local">, defaultstartedAt: null), so logging against a recent past day forced the user to hand-type every segment or click backward through the native calendar — described as "absurdly long." - Root cause: The field had no shortcut affordance and opened empty because new entries default
startedAt: null. The friction was purely the date-selection affordance — nothing about calculation or storage. - Fix: Prefill "Started At" with
now()when the create modal opens (logging against today is now zero interactions; a past date is reached by editing a populated field). The independent "auto-calculate start time" checkbox (back-dates start tonow() − duration) was briefly removed as redundant, then restored once that misread its distinct purpose, and renamed for clarity. Day-preset buttons from an earlier iteration were dropped. - Lesson: An empty input is the worst default for an "edit a near value" task — prefilling the common case (now) removes the build-from-empty friction more simply than adding preset buttons. Watch for two controls with overlapping-but-distinct purpose (prefill vs back-date-on-duration): removing one as "redundant" can quietly delete a different behaviour.
KD-0839 — PDF attachment preview renders blank (iframe sandbox blocks JS)
- Severity: medium
- Symptom: Clicking a PDF attachment opened the preview modal but the PDF never rendered — the iframe loaded the blob URL yet stayed blank. Images previewed fine (they use
<img>, not an iframe). - Root cause: The PDF
<iframe>hadsandbox="allow-same-origin". Asandboxattribute withoutallow-scriptsblocks all JavaScript inside the iframe, and both Chrome's native PDF viewer and Firefox's PDF.js need JS to initialise — so the blob loaded but rendered nothing. - Fix: Removed the
sandboxattribute. Blob URLs fromURL.createObjectURL()are ephemeral and tab-local and the content is our own server's — no cross-origin surface to sandbox. - Lesson:
sandboxwithoutallow-scriptssilently disables the JS that built-in PDF viewers depend on — a sandbox tight enough to block scripts blocks the very feature you're embedding. Don't sandbox an iframe whose source is a same-origin/tab-local blob you produced; there's nothing to isolate.
KD-0838 — AsyncErrorBoundary shows a fatal "Could not load page" for transient/user-action failures
- Severity: medium
- Symptom: The shared
AsyncErrorBoundaryrendered a full-page fatal "Could not load page." for any captured error exceptEntryNotFoundError. Two everyday events tripped it: a failed comment submission, and clicking an issue then hitting Back before it loaded (the fatal screen then persisted on the/boardroute that had loaded fine). - Root cause: Two defects. (1) Sticky boundary state —
onErrorCapturedsethasError = trueand never cleared it, andProjectLayoutkeeps one boundary instance alive across in-project tab switches (the RouterView child swaps, the boundary doesn't), so an error raised loading one route latched and poisoned the next. (2)submitCommentawaitedcreate()with notry/catch, so a rejected POST propagated throughonErrorCapturedinto the boundary. There is no request-cancellation infra in the FE, so the "aborted load" was the reporter's mental model, not a literalCanceledError. - Fix: Initially: reset
hasErroron navigation via aresetKeyprop (the boundary can't read the route — the app skipsapp.use(router)) plus a localtry/catchonsubmitComment. Per PR review, replaced the per-handler guard with a systemic discrimination:onErrorCapturednow inspects Vue'sinfoarg —'native event handler'/'component event handler'errors propagate untouched (toast middleware surfaces them, user stays on the page), while async-setup/render/lifecycle/watcher errors still latch the fatal screen.resetKeymade required so a future consumer can't forget it. - Lesson: An error boundary must distinguish a page-load failure (fatal screen is right) from a user-action failure (stay on the page, toast it) —
onErrorCapturedfires for every descendant error including event handlers, so a boundary that treats all errors as fatal turns any failed submit into a full-page crash. Discriminate by Vue'sinfosource. And latched boundary state must clear on navigation, or one route's error poisons its siblings. Prefer a required prop over an optional one for a guard that prevents a known bug (a future consumer can't silently forget it).
KD-0837 — Bar Color picker dims unselected swatches, distorting their hue
- Severity: low
- Symptom: In the epic "Bar Color" picker, selecting a colour appeared to change the colour of the other swatches (olive→yellow, orange→brown), as if the picker applied the wrong colour. Both report screenshots were of the picker itself, differing only in which swatch was at full opacity.
- Root cause: Unselected swatches were dimmed with
op-50(whole-element opacity) over a saturated fill. CSSopacitycomposites the fill against whatever is behind it — the page background, which differs per theme — so a saturated colour at 50% over a dark page mixes toward black and shifts perceived hue (not just brightness). The stored value was always correct; the defect was purely the picker's render. - Fix: Replaced the bespoke opacity-dimmed swatch grid with the
SingleSelectcolour dropdown already used for lanes/labels, which renders colour names (text) and never a dimmed swatch — sidestepping the root cause entirely. - Lesson: Whole-element
opacityon a coloured fill is theme-dependent by construction — it blends with the page background, so a "dimmed" selection indicator shifts the perceived hue differently in light vs dark mode. Indicate selection without touching the fill's alpha (a border, a check, or text-based selection), or the colour the user sees is a lie.
KD-0836 — Time-log summary cards bucket by logging date, not work date
- Severity: medium
- Symptom: On the Time Entries page the Today/Yesterday/Avg-per-day summary cards didn't reconcile with the filtered table — the cards counted hours on days that had no visible rows. Same dataset, different date axis.
- Root cause: The summary helpers (
filterByPeriod,getUniqueDaysCount) bucketed each entry oncreatedAt(when the entry was recorded) while the table presented each entry understartedAt ?? createdAt(when the work happened). When time is logged after the fact the two axes diverge, so the per-day cards counted hours on days the table never showed. The reporter's "summary uses a different dataset" hypothesis was wrong — same filtered dataset, wrong field. - Fix: Bucket the summary helpers on
startedAt ?? createdAt, matching the table's date column. Frontend-only; the backend range filter still usescreated_at(a broader product question parked). - Lesson: Two views over the same dataset must bucket on the same field, or they'll disagree without either being "wrong" — a summary and its table reconciling depends on a shared date axis (work date vs logging date), not just a shared filter. When numbers don't add up, suspect the axis before the dataset.
KD-0817 — Issue deletion fails on FK constraint for Hand-to-Claude tables
- Severity: high
- Symptom: Deleting an issue threw a 1451 FK-constraint violation whenever any Hand-to-Claude row referenced it (
claude_issue_eligibilities, its criteria children, orclaude_sessions). The same gap hitissue_label, andDeleteProjectActionwas additionally missingissue_watcherscleanup and thereports.promoted_issue_idnullify the single/bulk paths already did. - Root cause: KD-0658 shipped three tenant-DB tables with
restrictOnDelete()FKs toissuesbut didn't updateDeleteIssueAction,BulkDeleteIssuesAction, orDeleteProjectAction. The arch gate that would catch this (CascadeRelationsTest) only walksHasMany/HasOne/MorphManyrelations declared on the model — andIssuehad noclaudeSessions()/eligibility()relation, so the gate had nothing to enforce against. KD-0803 addedissue_label16 days later with the same omission; KD-0709 seeding the tables made it reproduce in dev. - Fix: Added
Issue::claudeSessions()HasMany +Issue::eligibility()HasOne and listed both incascadeRelations()so the existing arch gate now enforces future regressions. The three delete Actions guard on in-flight sessions (409), archive terminal-uncleaned sessions via a queued job capturing scalar IDs, delegate eligibility cleanup, and detach labels.DeleteProjectActiongained the missingissue_watchers+ report-nullify steps. - Lesson: An arch gate that enforces "every relation is cleaned on delete" is blind to relations the model never declares — a new table with a
restrictOnDeleteFK is invisible to the audit until someone adds the corresponding relation. The structural fix is to declare the relation and list it incascadeRelations()so the gate has teeth. The gate still doesn't cover BelongsToMany pivots (labels), so pivot omissions remain a known blind spot. (Same hand-maintained-cascade-list drift class as KD-0738.)
KD-0757 — Mention menu opens at page far-left on the first @-keystroke
- Severity: medium
- Symptom: Two positioning defects in the @-mention menu. (1) Typing
@flashed the menu at the far-left (~0,0) for one frame, then it snapped to the caret on the next character. (2) Once open, the menu didn't follow the caret when the page/editor scrolled. - Root cause: (1)
mountMentionListappended the element before the asyncupdatePositionapplied coordinates — for one frame it hadposition: staticand flowed to its container's top-left; on later keystrokes it already carriedposition: absolute, hence first-keystroke-only. (2) floating-ui'scomputePositionis one-shot; position was computed on open/update only, so the body-appended menu drifted from the caret inside its scroll container. - Fix: (1) Mount hidden+positioned: set
position: absolute+visibility: hiddenbeforeappendChild, reveal once the first computed coords are applied (visibility: hidden, notdisplay: none, keeps it measurable). (2) Position via floating-uiautoUpdate(runs immediately and on every scroll/resize) and return its cleanup, run on close/Escape so no listeners leak. - Lesson: An element positioned by an async callback flashes at its static-flow origin for the frames before coordinates land — mount it hidden-but-measurable and reveal only after the first compute. And one-shot
computePositiondoesn't track scroll; useautoUpdate(with a cleanup wired to close) when a floating element must stay glued to a moving anchor.
KD-0752 — Markdown (.md) attachments have no in-app preview
- Severity: low
- Symptom: Clicking a
.mdattachment thumbnail did nothing — it rendered as a generic file icon with only a download button. Upload and MCP fetch already worked; only the web preview failed. - Root cause: Markdown was absent from the frontend previewability path.
isPreviewableMimeTypereturned true only for images + PDFs, so the thumbnail click never emittedpreviewfor a.md, and the preview modal had no markdown branch (it would fall through to a raw<iframe>). Stored MIME for.mdis unreliable (text/plainfrom finfo), so detection has to key off the.md/.markdownfilename extension, not MIME. - Fix: Added
isMarkdownFilename(extension-based), made the thumbnail previewable for markdown, and added a preview-modal branch that fetches the bytes through the auth'd download endpoint, reads the blob as text, and renders via the app's existingrenderMarkdown→DescriptionProsestack. Frontend-only. - Lesson: A previewability check keyed on MIME alone misses file types whose stored MIME is generic (
.md→text/plain) — detect by extension when the MIME is unreliable. And when an app already owns a renderer for a content type, wire the preview path into it rather than falling through to a raw iframe.
KD-0644 — Empty toast container stays rendered on every page after toasts dismiss
- Severity: low
- Symptom: After the
fs-toast0.2.0 migration, the<div popover="manual">toast container never disappeared once all toasts dismissed — it lingered as an emptyfixedbox on every route (pointer-events-none, so it didn't block clicks, but always present). - Root cause:
fs-toasthides the closed container by callingel.hidePopover()and relying on the UA rule[popover]:not(:popover-open){display:none}. ButApp.vueapplied a bareflexutility (→display: flex) directly to that element. Author-origindisplay: flexalways beats the UA-origindisplay: nonein the cascade regardless of specificity, so the container never collapsed when closed. Theflex/fixed/z-1050attrs predated the migration and were harmless until the popover-based hide began depending ondisplay. - Fix: Gated the display to the open state: replaced the unconditional
flexwithclass="popover-open:flex", sodisplay: flexapplies only while:popover-open, and the UA rule hides the empty container. - Lesson: An author-origin
displaydeclaration unconditionally beats a user-agentdisplay: nonerule — so any styling that hard-setsdisplayon a[popover]element defeats the Popover API's own hide. Scope display utilities to:popover-openwhen the hide is delegated to the UA rule. A previously-inert utility can become load-bearing the moment a dependency starts relying on the property it sets.
KD-0624 — CascadeRelationsTest skips Tenant and misses trait-provided relations
- Severity: medium
- Symptom: Audit-coverage defect, not a runtime crash:
CascadeRelationsTest(the ADR-0002 gate ensuring every tenant model enumerates its cascade relations) was green precisely because two blind spots let it skip the cases it should catch. Un-blinding it surfaced four previously-invisible relations (Tenant::githubInstallations+ three Passport relations onUser). - Root cause: Two intentional skips. (1)
Tenantwas hardcoded into a$centralModelsexclusion list justified as "never deleted via application logic" — false, sinceDeleteTenantActioncascades real relations. (2) The "all relations listed" test filtered out any relation contributed by a trait; but PHP flattens trait methods onto the using class (reflection reportsgetDeclaringClass() === Tenant/User), so those relations are reachable and were being thrown away — for every model, not justTenant. - Fix: Removed the
$centralModelsexclusion and the trait-method filter, and replaced them with an explicit$nonCascadeRelationsallowlist (each entry justified inline) so every discovered relation must be either incascadeRelations()or acknowledged here. Verified by transiently dropping an entry and confirming the test then fails. - Lesson: A test exclusion justified by a comment ("never deleted", "trait-provided, skip") is a place bugs hide — the green suite was an artifact of the audit skipping its hardest cases. An audit should never silently drop a category; require every discovered item to be explicitly handled or explicitly acknowledged, so the skip list itself is reviewable. (Same exclusion-comment-rot theme as KD-0786.)
KD-0479 — Tooltip layout regressions and silent failures
- Severity: medium
- Symptom: Four regressions from the "bundle tooltips into components" pivot, caught in PR review.
Tooltip.vue's wrapper<div>participated in layout, breaking call sites relying on flex/absolute positioning (ml-autowatch button, report-card copy button).IconButton/ReportListItemsniffed$attrs['aria-label'], producing empty tooltips and inaccessible buttons when callers usedtitle=or omitted the label.DragElementre-introduced a trailing-space assignee label for single-name users. - Root cause:
Tooltip.vuewrapped its slot in aninline-block<div>that the parent's flex/grid algorithm laid out as a real item, so layout-affecting attrs (ml-auto,absolute) landed on the inner<button>, not the layout participant. The$attrs['aria-label']sniff was brittle — any caller usingtitle=or omittingaria-labelsilently got an empty tooltip and a button with no accessible name. - Fix:
Tooltip.vuewrapper switched todisplay: contents(anchoring floating-ui tofirstElementChild) so it no longer participates in layout;IconButtongot an explicit requiredlabel: stringprop (sweeping ~25 call sites off the$attrssniff);DragElementlabel changed to[firstName, lastName].filter(Boolean).join(' '). - Lesson: A wrapper element silently changes its children's layout context — a tooltip/HOC wrapper that isn't
display: contentsbecomes a real flex/grid item and misplaces positioning utilities meant for the wrapped element. And sniffing a value off$attrsis a brittle implicit contract: make it an explicit, required prop so a missing label is a type error at the call site, not an empty tooltip + inaccessible button at runtime.
KD-0807 — Multi-select drag in Backlog persists only one issue's move
- Severity: medium
- Symptom: Multi-selecting N issues in the Backlog and dragging one across sprint sections moved all N cards visually, but only ONE issue's change persisted. The other N−1 silently reverted on the next sync/refresh. The BulkActionBar "Move to" dropdown hit the same bug.
- Root cause: Regression from KD-0789's fractional-rank drag rewrite. The legacy bulk-update shape posted N updates in one request; the new single-issue
moveIssueForProject(issueId, payload)posts one move at a time, and the drag store's diff helper (findLaneChangedItem) returned on the first lane-changed item. So exactly one move request fired regardless of how many cards the user moved. The single-issue endpoint cannot carry N issues atomically. - Fix: Dedicated
BulkMoveAction+POST /api/projects/{project}/issues/bulk-move(mirroring the precedentBulkAssignEpicAction), with a sprint-only{issue_ids, target_sprint_id, position}payload — each issue keeps its lane + epic, onlysprint_id+rankchange. N ranks spread logarithmically viaRank::spread. FE drag store gained abulkUpdate()path that collects every lane-changed item. - Lesson: When you replace a bulk endpoint with a single-item one, audit every multi-select code path that fed the old shape — a diff helper that "returns the first changed item" silently drops the rest. And bulk operations need a dedicated atomic endpoint; you can't synthesize N-item atomicity by looping a single-item call. Sequentially
Rank::between-ing N cards also degrades rank length ~1 char per 4 cards, so spread balanced midpoints instead of chaining.
KD-0798 — VS Code extension shows no issues after API shape change
- Severity: high
- Symptom: Opening a project in the VS Code extension showed no issues and fired a "API error" notification. Assignee avatars rendered
[object Object]as the imagesrc. - Root cause: KD-0774 changed
IssueResourceDatato returnbranch_linksas full nested objects ({id, branch_name, branch_url, status}) instead of a flat status array. The extension was only partially updated — it kept the field name but typed each link as{status}, so it still fired N secondaryGET .../branch-linkscalls to fetch data already present in the initial response. Separately,profile_picturewas typedstring | nullbut the API now returns{avif, webp} | null, so the raw object was passed through as an image URL. - Fix: Widened the extension's
branch_linkstype to the full shape and derived branch names/URLs directly from the initial response (dropping the N secondary calls). Fixedprofile_picturetype and extractedavif ?? webp ?? null. - Lesson: A backend resource shape change ripples into every client that wasn't updated in lockstep — the extension is a separate consumer with no shared type contract, so a partial update left it making redundant calls AND mis-rendering. When a response field changes from scalar to object, every client's type and every place it's interpolated (especially as a URL
src) must be revisited.
KD-0788 — Central-binding arch test over-detects: only 5 of 14 flagged Actions were true gaps
- Severity: medium
- Symptom: The KD-0783 broader arch test listed 14 Actions in
centralBindingKnownGaps(), framed in the issue as "13 quick-wins, just add the binding." Latent/low-volume — the binding gaps would surface as audit-transaction crashes only when the affected central paths ran in prod. - Root cause: The arch test is a structural detector ("Action injects a central model AND has
->transaction(") but the canonical bug is behavioural ("the outer transaction opens on the wrong connection"). Of the 14: 5 were true gaps ($this->db->transactionwrapping central writes); 5 were already correct (transaction opened via$model->getConnection()->transaction(), which the test couldn't distinguish); and 4 were tenant-primary Actions where binding$this->dbto central would invert the bug — opening a central transaction while tenant writes committed unsynchronised. - Fix: Bound the 5 true gaps in
AppServiceProvider. Refined the arch test to require bothConnectionInterfaceinjection AND->transaction((dropping the 5 model-getConnection Actions naturally). Added inline@central-binding-exempt:markers to the 4 tenant-primary Actions and taught the test to honour them. EmptiedcentralBindingKnownGaps()to[]. - Lesson: A structural arch heuristic and the behavioural bug it targets coincide for the obvious cases and diverge for the rest — a "known gaps" list taken at face value will misclassify. Reading each flagged site beats trusting the count: blindly "adding the binding" to all 14 would have broken 4 working Actions. When a heuristic over-detects, the fix is to tighten the heuristic AND provide an auditable escape-hatch marker, not to suppress with a grandfather list.
KD-0787 — RollbackProvisioningAction unbound + inline DROP DATABASE on the bound connection
- Severity: medium
- Symptom: Same latent shape as KD-0783. Dormant in prod (
DOMAIN_PROVISIONING_ENABLED=false). The moment the rollback path ran with provisioning enabled, the central audit-write would throwAuditLogWriter must be called within a database transactionbecause the outer$this->db->transaction(...)opened on the default connection, not central. - Root cause: Two coupled defects. (1) The Action injected a generic
ConnectionInterfaceand wasn't contextually bound to central, so the audit-log model's hardcodedcentralconnection sawtransactionLevel() === 0. (2) An inline$this->db->statement('DROP DATABASE...')would route through whatever connection got injected — once bound to central, central's MySQL user may lack DROP privileges. (1) couldn't land without (2): binding to central while the inline DDL remained would make rollback DDL fail in any locked-down environment. - Fix: Injected the already-tenant-bound
DropTenantDatabaseActionfor the DDL, addedRollbackProvisioningActionto the centralConnectionInterfacebinding, and promoted its arch-test entry fromcentralBindingKnownGaps()tocentralActionsRequiringBinding()(turning the gate into a regression test). - Lesson: Binding an Action's connection and extracting its cross-connection DDL are coupled changes — you can't safely flip the binding while inline statements still inherit it. DDL that needs different privileges (DROP DATABASE on tenant) must be delegated to a connection-explicit sibling Action before the parent's connection is rebound.
KD-0786 — ProvisionDomainAction unbound from central despite being central-only
- Severity: medium
- Symptom: Same crash shape as KD-0783, dormant behind
DOMAIN_PROVISIONING_ENABLED=false. Would fireLogicExceptionon every provisioning state transition that emits an audit row the moment the flag flipped on. - Root cause: The Action's outer transaction resolved
ConnectionInterfaceto the default connection because it wasn't contextually bound to central. It was excluded from the binding with a stale comment claiming it "also does tenant-DB work" — but a post-KD-0580 state-machine refactor had made the Action central-only (Domain model is central, audit is central, theadvance()branches call external providers not the DB, noTenantSwitcher). The exclusion rationale had outlived its truth. - Fix: Added
ProvisionDomainActionto the centralConnectionInterfacebinding and moved its arch-test entry from gaps to required. No change to the Action — only container wiring was missing. - Lesson: Exclusion comments rot. A "we can't bind this because it does X" note must be re-validated against the current source before it's trusted — a later refactor can remove X and leave the stale exclusion silently masking a bug-in-waiting. The arch test's sentinel ("an audit-writing Action must be in exactly one of bound-list or gaps-list") is what forced the decision instead of letting it sit undecided.
KD-0785 — CreateTenantAction crashes on first prod central-admin invite
- Severity: high
- Symptom: Same latent KD-0783 shape. The central-admin "invite a tenant" flow (
POST /api/central/tenants) worked in dev/test (whereDB_CONNECTIONfalls through tocentral) but the first invitation in prod would throw the audit-transaction assertion because the outer transaction opened on the defaultmysqlconnection while writes hit central models. - Root cause: The Action wasn't in the central
ConnectionInterfacebinding because its privatecreateAdminUseropened an inner transaction on the same$this->dbafter aTenantSwitcher::switchTo()— and those inner writes target the tenant DB (admin User row + role pivot). Binding the whole Action to central would route the inner tenant transaction to central too. Same shapeSignupActionhad pre-KD-0783. - Fix: Mirrored KD-0783's extraction exactly — pulled the tenant-DB work into a new sibling
CreateInvitedTenantAdminUserAction(injecting the default tenantConnectionInterfaceand owning theswitchTo/resetlifecycle), then boundCreateTenantActionto central and promoted its arch-test entry to required. - Lesson: An Action that opens transactions on two different connections cannot be contextually bound to either — the only clean fix is to split the second-connection work into its own Action with its own binding. The "extract the tenant-DB half into a sibling Action" pattern is now the canonical resolution for this entire class (KD-0783 → KD-0785 → KD-0787).
KD-0783 — Public signup 500s in prod on audit-log transaction assertion
- Severity: high
- Symptom: Every public signup at
central.kendo.dev/signupreturned HTTP 500 withRuntimeException: AuditLogWriter must be called within a database transaction for hash chain integrity. Production-only — the exact path passed every CI test and worked locally. A partial central row (Tenant insert) could commit before the audit assertion fired. - Root cause:
SignupActionandCreateDomainActioninjectedConnectionInterfacewith no connection name, so the container resolved the default connection. Prod's.fly/config/prod.tomlsetsDB_CONNECTION=mysql, so$this->db->transaction(...)opened onmysql. But the audit-log models hardcode$connection = 'central', andassertWithinTransactioncheckscentral's transaction level — found 0 (the open transaction was onmysql) and threw. Tests didn't catch it becauseDB_CONNECTIONis unset in tests, so both connections resolved to the same instance. The same defect was latent in every central audit-writing Action. - Fix: Contextually bound
ConnectionInterfacetocentralinAppServiceProviderfor all 10 central audit-writing Actions. ExtractedSignupAction's tenant-DBcreateAdminUserintoCreateTenantAdminUserAction(default connection) and its inlineDROP DATABASEintoDropTenantDatabaseAction(tenant connection). Added an arch test forcing every newly-discovered central audit-writing Action into either the bound list or a known-gaps list. - Lesson: Injecting an unqualified
ConnectionInterfacesilently binds to whateverDB_CONNECTIONresolves to — fine until a model hardcodes a different connection, at which point transaction-scoped invariants check the wrong connection. Tests that leaveDB_CONNECTIONunset collapse distinct connections into one instance and hide the entire class; an arch test that asserts the binding decision is the only reliable gate. (Also surfaced two side findings:DB_CONNECTION=mysqlreferences a connection that doesn't exist in config — it only resolves via Laravel's framework-default merge — andDB_PASSWORDwas visible in plaintext viafly ssh console env.)
KD-0761 — Avatar initials low-contrast, undersized, off-center
- Severity: low
- Symptom: Fallback avatar initials were hard to read — in dark mode, white text on bright tint backgrounds (e.g.
#4ade80) hit contrast ratios as low as 1.3:1, far below WCAG AA's 4.5:1. Initials also rendered too small and sat slightly high. - Root cause:
ProfilePicture.vuehardcodedc-whitefor initials regardless of theme; the dark-mode tint palette uses high-luminance backgrounds where white text fails contrast. Font sizes were ~35-40% of the container, weight was too light, and no optical baseline compensation was applied for uppercase glyphs. - Fix: Added a theme-aware
--avatar-initialsCSS variable (near-black in dark mode, white in light), bumped font sizes and weight to 700, addeditems-center+ a 0.5pxtranslateYoptical nudge. - Lesson: Hardcoding
c-whitefor text-on-color assumes dark backgrounds — a bright tint palette inverts that assumption. Contrast-sensitive colors must be theme-aware tokens, not literals. (The fix also caught a recurring trap: stale test assertions pinning the pre-fix font sizes blocked the first verification.)
KD-0760 — Delete confirmation dialog shows "Submit" instead of "Delete"
- Severity: low
- Symptom: Destructive confirmation modals (delete attachment, delete issue, delete tenant, etc.) showed a generic "Submit" confirm button instead of a destructive verb — ambiguous, and visually identical to a benign save, raising accidental-confirmation risk on irreversible actions.
- Root cause:
confirmModaldefaultsconfirmButtonTextto'Submit'. Nine destructive call sites omitted the third positional argument and inherited the generic label. (~23 other call sites already passed an explicit verb, so the misbehaviour was purely the default kicking in on incomplete calls.) - Fix: Passed an explicit
'Delete'(or equivalent verb) at all 9 destructive call sites. Left the helper's'Submit'default in place but now unreachable from any destructive flow. - Lesson: A permissive default on a shared destructive helper is a latent footgun — every incomplete call site silently inherits the wrong label. For destructive actions the safer design is no default (force the caller to name the verb), but at minimum every call site must be audited when a default is too generic to be safe.
KD-0759 — File-upload drop zones too short to hit reliably
- Severity: low
- Symptom: Drop zones (attachment uploaders, profile picture modal) rendered short — standard ~100px, compact ~40px, profile ~80px. Files released slightly above/below the dashed border missed the drop and landed on the page.
- Root cause: None of the three dropzone surfaces set a
min-h-*, so the affordance collapsed to its icon + text height. The@drophandler fires on the same<div>that draws the border, so the visible affordance IS the hit area — a short visual yields a short hit target. The compact variant fell below the WCAG 2.5.5 44px floor. - Fix: Added
min-h-32(128px) to standard + profile dropzones,min-h-14(56px) to compact, plus flex centering. No padding/border/copy changes. - Lesson: When the visual affordance and the event target are the same element, the visual size directly determines the hit area — sizing it for "looks fine" isn't the same as sizing it for "easy to hit." Set an explicit minimum height against a real interaction target (WCAG 2.5.5's 44px floor as the baseline).
KD-0756 — Invite form not reset after a successful invite
- Severity: low
- Symptom: After inviting a user, re-opening the invite modal showed the previous person's data still populated in every field. Users had to refresh to get a clean form, risking re-submitting the same details under a different email.
- Root cause:
newInviteis a module-scopedrefpassed by reference to the modal each time it opens; the form mutates that object in place viav-model. TheonSubmitsuccess path posted, refreshed, toasted, and closed the modal — but never resetnewInvite.value, so stale data persisted across openings. - Fix: One line — reassign
newInvite.valueto empty defaults aftercloseModal()inonSubmit. The toast captures the invitee's name before the reset; a failed invite throws before the reset, keeping the form populated for retry. - Lesson: A reused, mutated-in-place form model needs an explicit reset on the success path — close-and-reopen does not clear it because the same object reference is handed back. Reset after success, but only after success, so failures keep the user's input for retry.
KD-0754 — Tab walks through every WYSIWYG toolbar button
- Severity: low
- Symptom: Tabbing from a form's title field into the RichTextArea description forced keyboard users through ~8 formatting toolbar buttons first (H1/H2/H3/Bold/Italic/UL/OL/Raw). Reproduced everywhere
RichTextAreais consumed (comments, issue templates, AI story prompt, epic form). - Root cause:
FormatButton.vuewas a plain<button>with defaulttabindex="0", and the toolbar<div>precedes the editor content in DOM order — so every button became a tab stop before the editor. Norole="toolbar"or roving-tabindex consolidated them into one stop. - Fix: Applied the WAI-ARIA toolbar pattern (roving tabindex):
role="toolbar"+aria-label, exactly one button attabindex="0"and the rest at-1, arrow keys for internal navigation. Format actions remain reachable via Tiptap shortcuts; the raw-mode toggle stays keyboard-reachable (which a flattabindex="-1"strategy would have lost). - Lesson: A group of related controls should be a single tab stop with internal arrow-key navigation (the WAI-ARIA toolbar pattern), not N sequential tab stops. A shared component is the right fix point — wiring roving tabindex once protects every consumer.
KD-0753 — LinkBranchTool rethrows raw exceptions as JSON-RPC -32603
- Severity: medium
- Symptom: Any failure in the MCP
LinkBranchToolrethrew the rawThrowable, which the MCP framework mapped to a generic-32603internal error. Callers couldn't distinguish a duplicate branch link from a cross-project mismatch, a deadlock, or a broadcast failure. Under parallel agent fan-out the error was consistent and unactionable. - Root cause: The top-level catch captured the exception for the audit log but then
throw $throwable'd the raw exception — violating the documented Exception-Leak Discipline ("MCP tools must never rethrow raw Throwables from their top-level catch"). Concurrent calls could throw deadlock/unique-constraint exceptions from InnoDB gap locks, all flattened to-32603. - Fix: Replaced the rethrow with three ordered catch blocks:
BranchAlreadyLinkedExceptionandCrossProjectExceptionreturn specific structured messages; remainingThrowableis logged with scoped context and returned as a generic structured error instead of rethrown. - Lesson: At a protocol boundary (MCP/JSON-RPC), a raw rethrow collapses every distinct failure into one opaque code — the caller (often another agent) loses all ability to act. Top-level catches at such boundaries must map known exceptions to structured errors and log-then-wrap the unknown, never rethrow raw.
KD-0738 — Project deletion 500s on unhandled RESTRICT foreign keys
- Severity: high
- Symptom:
DELETE /api/projects/{project}returned 500 for any project that had been used (triggered a Claude session, been watched, linked to a tenant AI key, etc.) withSQLSTATE[23000]: Integrity constraint violation. - Root cause:
DeleteProjectActionwalks a hand-maintained list of descendant tables to delete before the parents. That list was last extended at a 2026-02-18 cascade-to-restrict audit. Six tables withrestrictOnDelete()FKs into the project subtree have landed since (claude_sessions,issue_watchers,attachment_extracted_contexts,claude_issue_eligibilities,claude_issue_eligibility_criteria,tenant_ai_key_project) and none were cleaned up, so MySQL blocked the parent delete. - Fix: Added six raw
db->table(...)->whereIn(...)->delete()calls inside the existing transaction, in FK dependency order (criteria before eligibility, contexts before attachments, pivot before project). - Lesson: A hand-maintained cascade-delete list is guaranteed to drift — every new table with a RESTRICT FK into the subtree must be manually added, and nothing fails until a populated project is deleted in prod. This bug class has no automated check today; an arch test that diffs schema FKs into project/issues against the Actions that delete them would close it. (Related drift:
BulkDeleteIssuesActionalready deletedissue_watchersbutDeleteProjectActiondidn't — the two tear-down paths had diverged.)
KD-0734 — Complete Sprint modal prompts for incomplete issues when none exist
- Severity: medium
- Symptom: Two defects. (1) The Complete Sprint modal always showed "What should we do with incomplete issues?" even when every issue was already Done. (2) After completing a sprint, the board kept showing the completed sprint until a manual reload.
- Root cause: (1)
hasNoIssuescheckedissuesCount === 0(total issues) instead of incomplete-issues count — and the backend never exposed an incomplete count, so there was nothing else to check. (2)CompleteSprintActionwas the only mutating sprint Action that never calledSprintBroadcaster->updated(), so the reactive store was never notified. A follow-up surfaced a third issue:makeSprintStoreForProjectwasn't memoized, so the modal'sretrieveAll()refreshed a different store instance than Backlog used — and since the broadcast ships with->toOthers(), the originator's UI had no refresh path at all. - Fix: Added lazily-computed
incomplete_issues_counttoSprintResourceData; switched the modal to check it. InjectedSprintBroadcasterintoCompleteSprintAction. Memoized the sprint store byprojectIdand derivedhasNoIssuesfrom the freshly-retrieved store value rather than the stale prop snapshot. - Lesson: "No issues to move" is a count of incomplete issues, not total — modeling a domain question with the nearest-available field produces noise. And a mutating Action that skips the broadcast its siblings all fire is a silent realtime gap. The deeper trap:
->toOthers()excludes the originator, so the person who triggered the action depends entirely on the HTTP response refreshing the same store instance — an unmemoized store factory quietly breaks that for the one user who most expects to see the result.
KD-0733 — Markdown tables render as unstyled plain text
- Severity: low
- Symptom: GFM tables in issue descriptions showed header and cell text with no borders, no row separation, no padding — indistinguishable from two lines of plain text.
- Root cause:
markedcorrectly emitted<table>/<thead>/<tr>/<th>and DOMPurify preserved them, butmarkdown.cssdefined.description-prosestyles for every other prose element and had no rules for table elements. The browser's default table rendering has zero borders. - Fix: Added
.description-prosetable styles (border-collapse, borders viavar(--border), header background, alternating row background, padding) matching the file's existing visual language. - Lesson: A prose stylesheet is only complete for the elements it explicitly targets — when a markdown renderer can emit an element type (tables) that the prose CSS never styled, it falls back to unstyled UA defaults. Cross-check the renderer's full output tag set against the prose stylesheet's coverage.
KD-0725 — Modal dialogs overflow viewport on narrow screens
- Severity: medium
- Symptom:
largeshared modals (1360px design width) overflowed the viewport across the entire 1024–1359px range — the half-screen-of-a-1920px-monitor band up through small desktops. Right edge clipped, close button hidden, horizontal scroll appeared. - Root cause:
BaseFormModal/BaseShowModalswitched to fixed pixel widths (lg:w-100/220/340) at thelgbreakpoint with no viewport guard. Belowlgthew-90vwfallback was already viewport-capped, so the bug only triggered in thelg+band where the fixed width exceeded the viewport. - Fix: Added
max-w-95vwto every entry in both size maps, so resolved width becamemin(design-width, 95vw). - Lesson: A fixed pixel width above a breakpoint assumes the viewport is always wider than the design width — false for half-screen and mid-desktop widths. Any fixed-width element needs a viewport-relative
max-widthcap. An arch test scanning forlg:w-<n>on a<dialog>child without a matchingmax-w-<n>vwwould prevent the regression class.
KD-0700 — Hand-to-Claude grader verdict read from a key Anthropic never sends
- Severity: high
- Symptom: Every graded Hand-to-Claude session was recorded as
Failedregardless of the actual grader verdict — the UI said "Claude could not finish the issue" even when the grader explicitly satisfied the rubric. Confirmed in prod on a session whose Anthropic events API showedresult: "satisfied"but kendo storedstatus = Failed. - Root cause:
handleOutcomereadrawPayload['outcome']['result']from the webhook, but Anthropic'soutcome_evaluation_endedwebhook is a notification only — it carries nooutcomekey at any depth. The real verdict lives on the session'soutcomeEvaluations[]list, already returned by thegetSessionretrieve call — butaggregateEventswalked the event stream for tokens/iterations only and never surfaced it. A secondary defect:triggerCleanuparchived the session unconditionally on the firststatus_idledwebhook (fired between implementer end_turn and grader start, while Anthropic had flipped back torunning), 400ing on every run. - Fix: Added
outcomeResulttoSessionResultData, populated from the latestoutcomeEvaluationsentry ingetSession, and read the verdict from there instead of the webhook payload. GatedtriggerCleanupon the session reportingidle/terminatedrather thanrunning. - Lesson: Reading state from a webhook payload that the provider documents as a notification-only event guarantees a wrong answer — the authoritative state must come from the retrieve call. The test suite hid it because the test helper fabricated the
outcomekey the provider never sends: a fixture builder that synthesizes a shape no real API produces will keep a bug green forever.
KD-0699 — PR-evidence parser rejects every verbatim MCP response
- Severity: high
- Symptom: Every Hand-to-Claude implementer session that successfully opened a PR was terminated as
Failed/missing_pr_evidencebefore the grader ran — burning the full token spend with no verdict and permanently marking the issue Failed. The implementer pasted the MCP tool response verbatim as instructed. - Root cause:
extractPullRequestUrlFromMessageread$decoded['html_url'], but the GitHub MCPcreate_pull_requesttool returns{id, url}with nohtml_url. Both the kendo parser AND the implementer system prompt's "good output looks like this" example encoded the GitHub REST API shape rather than the MCP tool's actual shape — so the prompt-mandated verbatim quoting was structurally guaranteed to fail the parse. The test suite passed because the test helper generated the same wrong shape the prompt documented. - Fix: Made the parser accept
url ?? html_url(MCP shape first, REST shape as forward-compatible fallback), both still validated through the github.com PR-URL regex. Updated the prompt example and test helper to the real MCP shape. DownstreamverifyPullRequestOpenstill confirms the PR exists, so the looser key set didn't weaken fabrication defence. - Lesson: When a prompt instructs the model to quote a tool's output verbatim, the parser must match what the tool actually emits, not what an API doc says — and the prompt example, the parser, and the test fixture must all agree on the real shape. Three places encoded the same imagined shape; prod was the first witness. Audit fixture builders for synthesized-vs-real payloads. (Side note: the diagnosed session cost ~$31 on Opus for well-trodden work — flagged a model-tier question.)
KD-0693 — Anthropic session cleanup 400s archiving the primary thread
- Severity: high
- Symptom: Every terminal Hand-to-Claude session hit
400 invalid_request_error: "The primary thread cannot be archived; archive the session instead."Because the exception threw beforecleaned_up_atwas stamped, the webhook job retried forever and the hourly prune re-hit the same failure every tick (9 occurrences in two hours post-deploy). - Root cause:
ArchiveAnthropicSessionResourcesActionwalked every thread viastreamThreadsForSessionand calledarchiveThreadon each — including the primary thread, which Anthropic rejects. The streamer yielded the primary thread (parentThreadID === null) despite its contract implying only archivable threads. Compounding it, the Action never calledsessions->archive($sessionId)at all, so sessions were never archived on the Anthropic side even before the 400 surfaced. - Fix: Filtered the primary thread (
parentThreadID === null) out ofstreamThreadsForSession, and added anarchiveSessionprimitive called once after the child-thread and vault archives, before stampingcleaned_up_at. - Lesson: When a cleanup step throws before its idempotency marker is set, it retries forever and turns a single failure into a recurring incident — cleanup loops must either tolerate the rejecting case or stamp progress before the fragile call. And a streamer whose contract says "things you can archive" must actually filter to that set, or every caller inherits the exception.
KD-0691 — session.status_idled webhook events silently dropped
- Severity: high
- Symptom: When a Hand-to-Claude session completed naturally, Anthropic emitted
session.status_idled— the dedup row was written (soprocessed_atlooked healthy) but no status update, no completion comment, no audit row, no broadcast, and no cleanup ran. From kendo's POV the session was permanently in flight; Anthropic-side resources sat until the 30-day TTL. - Root cause: All three
match ($data->eventType)blocks inHandleSessionWebhookActiononly enumeratedoutcome_evaluation_endedandstatus_terminated—session.status_idledfell through todefault => null. The dedup row was written before the inner match, which is exactly what made the failure silent: the webhook-events table looked processed while the session stayed Pending. - Fix: Added
session.status_idledto all three match blocks and ahandleIdledmethod branching onstopReason(end_turn→ Completed, anything else → Failed), mirroringhandleTerminated. Defensive early-return-with-warning if the aggregated result is null. - Lesson: A
matchwith a silentdefault => nullarm is a trap for event-type handling — a new (or unhandled) event type produces no error, just missing work. And writing a dedup/processed marker before the work means the marker lies when the work is skipped: record "processed" only after the handler actually runs, or unhandled events masquerade as healthy.
KD-0634 — Filter state leaks across projects, blanking Backlog/Board
- Severity: medium
- Symptom: Project-scoped issue filters (selected lanes/epics/creators/sprints) persisted across navigation between projects and across reloads. Because lane/epic/sprint IDs are per-project auto-increment keys, a filter from Project A matched nothing in Project B — the middle pane rendered empty with a stale filter chip showing. Affected Backlog, Board, and Overview.
- Root cause:
filters.tsdeclared module-level singleton refs persisted under global localStorage keys. Four held project-scoped IDs; the matchers did strict ID equality with no project-membership check, so a previous project's IDs filtered out every issue in the current one. - Fix: Per-slot storage keys — each project gets its own slot (
issue-filters.{projectId}.selectedLanes), hydrated onsetFilterProject(projectId). Cross-project MyIssues routes through a fixedmyissuesslot. Rejected the simpler "single global key + reset on project change" because localStorage is shared across tabs, so a Ctrl+Click into Project B would silently wipe Project A's filter in another tab. - Lesson: State persisted under a global key but holding scope-specific identifiers will leak across scopes — and the simpler "reset on change" fix breaks under multi-tab because localStorage is origin-shared. Per-scope storage slots sidestep both the leak and the cross-tab race. (Also:
selectedSprintswas dormant — declared and cleared but wired into no page; clearing it for hygiene future-proofs whoever wires it up.)
KD-0631 — Blank page when async component setup fails
- Severity: medium
- Symptom: When an HTTP request failed during a page's async
<script setup>(observed with 429s in prod during rapid navigation), the page content rendered blank — no error state, no retry. Error toasts ("Too Many Attempts.") did appear, but the content never rendered. - Root cause: Three layers combined. (1) Pages fired unguarded
await Promise.all([...])at the top of async setup — one rejection killed the whole batch. (2) Layouts wrapped<RouterView>in<Suspense>, which has no native error slot — an async child rejection puts Suspense into an unrecoverable blank state. (3) App-levelonErrorCapturedonly handledEntryNotFoundErrorand re-threw everything else. - Fix: A shared
AsyncErrorBoundarycomponent (usingonErrorCaptured) placed at the two Suspense boundaries (ProjectLayout,SharedDomainLayout), rendering "Could not load page" + a "Go back" button instead of blanking. Explicitly passesEntryNotFoundErrorthrough to the existing App.vue handler. No per-page changes. - Lesson: Vue's
<Suspense>has no error slot — an async setup rejection blanks the subtree unrecoverably unless an error boundary wraps it. Toasts surfacing the error don't help; the failure is a separate code path. One boundary at the Suspense seam covers every page beneath it. (Investigation also spun off KD-0679/0680/0635 on the underlying rate-limit pressure from unconditional refetches and orphaned broadcast subscriptions.)
KD-0512 — Reports detail pane cramped at tablet / mid-desktop widths
- Severity: low
- Symptom: The Reports page right-hand detail pane became unusable between ~768px and ~1200px — at 960px the report title wrapped character-by-character and the AI stepper labels overlapped into mush; at 768px the pane collapsed to a ~50px sliver.
- Root cause: The two-pane layout had only one breakpoint guard (
lt-md:flex-colat 768px). A fixed 400px left pane + persistent sidebar + padding left the detail pane only ~300-500px across the entire 768-1200px band — below the AI stepper's ~480px minimum. The epic documented a "narrow" breakpoint at <1100px that the Reports page never honoured. - Fix: Added a shared
isNarrowref (NARROW_BREAKPOINT = 1100) to the breakpoint service and extended the existing master-detail pattern to fire below 1100px — list OR detail, not both, with a back button. - Lesson: A layout built two-pane-first for wide monitors needs a breakpoint between "wide desktop" and "mobile stack" — the half-screen / mid-desktop band gets the worst of both otherwise. When an epic already defines a "narrow" threshold, page-level layouts must honour it via a shared signal rather than inventing per-page breakpoints.
KD-0687 — Implementer agent silently reverts to always_ask on every re-run
- Severity: high
- Symptom: The Implementer agent's GitHub MCP calls (
create_branch,push_files,create_pull_request, …) parked the session waiting on a human after any re-run of the provisioner. Production currently worked only because the policy was patched out-of-band via manual curl. - Root cause:
backend/scripts/provision-hand-to-claude.phpbuilt the Implementer'smcp_toolsetwithout apermission_policyfield. Anthropic's Managed Agents API defaults an absentpermission_policytoalways_ask, so the script — the supposed source of truth — disagreed with the live agent state, and any re-run silently overrode the manual fix. - Fix: Extracted the toolset block into a
require-returns-array companion (backend/scripts/lib/hand-to-claude-implementer-tools.php) that declares$alwaysAllow = ['type' => 'always_allow']once and applies it to both the toolset'sdefault_configand every entry inconfigs. Added a unit test asserting the shape. - Lesson: Provisioning scripts that PATCH external systems must declare every policy field explicitly — relying on API defaults means the script and the live state can diverge silently, and any "fix" applied out-of-band is one re-run away from being clobbered.
KD-0663 — Issue show page does not update from broadcasts
- Severity: medium
- Symptom: Editing an issue's title in tab B didn't re-render tab A's
<h1>until manual reload. The bell-watch toggle had its own GET endpoint, optimistic-rollback try/catch, and a manual race counter — none of which updated when other tabs watched/unwatched. - Root cause: Show.vue had no project-channel broadcast subscription at all. A mid-fix attempt introduced a per-issue channel (
Tenant.{t}.Project.{p}.Issue.{id}) + page-scopeduseLiveIssueDetailcomposable +applyResource/setByIdleaks on the issue store — re-implementing whatlanes/sprints/commentsalready did via the project-wideProjectDomainUpdateEventchannel. The fs-adapter-store package docs explicitly call out exposingsetByIdas an anti-pattern. - Fix: Reverted the per-issue channel; broadcast full
IssueResourceDataon the existing project-wide channel viaProjectDomainUpdateEvent. Addedwatcher_idstoIssueResourceData, madeToggleIssueWatchActionreturn the issue and fan out viaIssueBroadcaster::updated(), and replaced the watch GET/optimistic plumbing with a one-linecomputed+ anissue.watch()adapter method that uses the package's sanctionedstoreModule.setById. - Lesson: Before inventing a new realtime channel or store mutator, check whether sister relations (lanes/sprints/comments) already solve it on the project-wide channel — payload-size arguments rarely justify the architectural cost once you measure (typical issue ~2.5 KB compact vs Reverb's 10 KB ceiling). And when an adapter package documents
setByIdas an anti-pattern, exposing it on the store wrapper is a code smell, not a workaround.
KD-0654 — IssueForm submit button not disabled during in-flight save
- Severity: medium
- Symptom: Rapid double-click on Update/Create/Promote fired the handler twice in parallel before the first round-trip resolved. Edit popped history twice; Create produced a duplicate issue with orphan attachments associated to only one; ReportDetail promoted the report twice.
- Root cause:
IssueForm.vue's submit button had no:disabledbinding, and none of the three call sites (Edit.vue,Create.vue,ReportDetail.vue) wrapped theirawaitin anisSubmittingguard ortry/finally. The browser fired duplicate submit events freely; the async operations were independent network requests, so both succeeded. - Fix: Two-layer guard. Added optional
isSubmitting?: booleanprop toIssueFormbound to the button's:disabled. Each call site got a localisSubmittingref, an early-return guard at the top of the handler (covers syntheticrequestSubmit()paths), and atry/finallyaround the await so the flag resets on throw. - Lesson: A shared form component is the leverage point for double-submit prevention — every consumer is one optional prop away from being protected. And the guard needs both layers:
:disabledblocks the click, the early-return covers synthetic submits, andtry/finallyguarantees the flag resets even when the network call throws (otherwise a failed submit locks the form forever).
KD-0653 — UpdateIssueAction silently drops attachmentIds from PUT requests
- Severity: medium
- Symptom:
PUT /api/projects/{id}/issues/{slug}acceptedattachment_idsin the body, validated it, populated the DTO, and returned 200 OK — butUpdateIssueActionnever read the field. The API advertised behaviour it didn't implement. - Root cause:
SaveIssueRequestandSaveIssueDatawere shared across Create and Update because both controller actions used the same FormRequest. The Create path needsattachmentIdsfor the orphan-claim pattern (uploads happen before the issue has an ID); on Update, attachment edits go through dedicatedmakeAttachmentStore()endpoints, soUpdateIssueActioncorrectly didn't act on the field — but the shared DTO kept advertising it. - Fix: Per ADR-0020, split
SaveIssueDataintoCreateIssueData(withattachmentIds) andUpdateIssueData(without), andSaveIssueRequestintoCreateIssueRequest/UpdateIssueRequest. Update path's validation rule removed entirely. FrontendIssueBaselostattachmentIds; aNewIssueMutabletype carries it as a Create-only payload. - Lesson: Sharing a FormRequest/DTO across Create and Update sounds DRY but encodes a lie when the two paths have different field surfaces — the silent-drop is the symptom, the type signature is the bug. Direction-specific DTOs make the contract honest and let the type system reject the misuse.
KD-0583 — Dead unsaved-content warning on Create Issue page
- Severity: low
- Symptom: The Create Issue page had a "you have unsaved files, leave anyway?" warning that had never fired in this codebase. No user had reported the missing dialog.
- Root cause:
onBeforeRouteLeavefromvue-routerrequires the router to be installed viaapp.use(router). This app uses a customcreateRouterView()shell (shared/services/router/components.ts) and never callsapp.use()with the router, so the guard registered against an absent router and silently did nothing. The companionbeforeunloadlistener only fired on tab close, not the in-app navigation case the warning was meant to cover. The author shipped without verifying the guard fired. - Fix: Deleted the dead block —
onBeforeRouteLeave, thebeforeunloadlistener, theonUnmountedcleanup, and theclearOrphanAttachmentshelper (no other callers). Orphan attachments are pruned server-side byPruneOrphanedAttachmentsActionafter 24h, so no hygiene gap. - Lesson: Vue Router composition-API guards (
onBeforeRouteLeave,onBeforeRouteUpdate) silently no-op when the router isn't installed as a plugin — apps using a custom router-view shell must verify any router-guard hook actually fires before shipping it, because the failure mode is invisible.
KD-0626 — lint-staged glob never matches, ESLint skipped on commit
- Severity: low
- Symptom: Pre-commit hook printed "No staged files match any configured task" and ESLint never ran locally — errors only surfaced on CI.
- Root cause:
lint-stagedconfig infrontend/package.jsonused globs anchored at repo root (frontend/src/**/*) but the hook ran lint-staged with cwdfrontend/, where staged paths resolve tosrc/.... Off-by-one prefix. - Fix: Stripped the
frontend/prefix from both glob keys. - Lesson: Glob patterns must be relative to the cwd of the tool that evaluates them — when a tool is launched from a subdirectory, every config path inside it is anchored there.
KD-0606 — AI generate-story keys collide with snake_case wire format
- Severity: high
- Symptom: "Generate" button on Reports/Issues AI panel returned 422 "The source description field is required" even though the report had a description.
- Root cause: The frontend HTTP middleware runs
deepSnakeKeys()on every outbound payload, butAgentGenerateStoryRequest::rules()keys were camelCase (sourceDescription). Wire body shippedsource_description; rule never matched. The earlier KD-0511 rename intended snake_case but wrote camelCase. Feature tests posted camelCase directly, bypassing the middleware, so CI stayed green while production was broken. - Fix: Renamed rule keys to snake_case; added arch test rejecting any camelCase top-level rule key; updated tests to post the real wire format.
- Lesson: Feature tests that bypass the global request middleware can hide wire-format mismatches indefinitely — when middleware mutates payload shape, tests must post the post-middleware shape, not the pre-middleware shape.
KD-0605 — Stale checkedReportIds selection promotes the wrong report
- Severity: high
- Symptom: After dismissing a checked report and clicking a different one, pressing Promote generated an issue from the previously checked report. UI showed report B; API received report A's id.
- Root cause:
selectedReportId(detail pane) andcheckedReportIds(multi-select for promote) were two independent pieces of state. The set was never pruned when a report transitioned out of pending — dismissed reports stayed checked andReportDetail.promoteReportspreferred the non-empty stale set over the visible report. - Fix: Self-heal in the
checkedReportscomputed by filtering ongetReportStatus(report) === pending, so non-pending reports drop out of the multi-select reactively. - Lesson: Two pieces of selection state that model the same intent will drift — prefer derived/filtered state over manually synchronised mirrors, or self-heal in the computed by gating on the source-of-truth status.
KD-0604 — parseDuration silently drops decimals
- Severity: medium
- Symptom: Users entering
"2.5h"saw it round-trip to"5h"(300 minutes) instead of 150 minutes. No error — silent corruption. - Root cause:
DURATION_PATTERN = /(\d+)\s*(w|d|h|m)/giwas non-anchored and integer-only, used withmatchAll. It silently skipped any character that didn't fit the pattern — decimals, commas, junk after a fragment."2.5h"matched only5h. - Fix: Split into
VALIDATION_PATTERN(anchored, validates whole input) andEXTRACTION_PATTERN(extracts each chunk). Reject inputs that don't fully match instead of partial-summing. - Lesson: Non-anchored
matchAllover user input is a silent-corruption pattern — when parsing structured input, validate the whole string against an anchored pattern before extracting parts.
KD-0601 — Dragging issue to sprint shows "unauthorized"
- Severity: medium
- Symptom: Users with "Own" update scope could change an issue's sprint via the edit modal but got 403 when dragging the same issue on the backlog.
- Root cause:
IssuePolicy::updateBoard()calledCheckPermission::check()with no$ownerId, so the "Own" scope check evaluatednull !== null→ alwaysfalse. The siblingupdate()method correctly passed$issue->user_id. - Fix: Pass
$user->idas$ownerIdinupdateBoard(); additionally add per-issueGate::authorizeinUpdateIssueBoardActionfor issues that actually moved (sprint/lane/epic changed). - Lesson: Policies that share a permission scope must share a calling convention — when scope semantics depend on a parameter (like
$ownerId), every policy method that checks that scope must pass it the same way, or "Own" silently means "deny everyone".
KD-0600 — GitHub App install fails on webhook/redirect race
- Severity: high
- Symptom: Users completing GitHub App install were told "Installation failed — please close this tab and try again" while the recovery URL silently still worked. Reproduced on production for the emmie tenant.
- Root cause: GitHub fires the
installationwebhook and the browser redirect concurrently with no ordering guarantee. The webhook controller queuedProcessInstallationWebhookJoband returned 200 immediately; the redirect's one-shot DB lookup hit before the job ran. Worse, the error copy told users to close the tab — abandoning the working recovery URL. - Fix: Process installation events inline in the webhook controller (200 only after row committed); add bounded retry (
[200, 500, 1000, 1000]ms) in the lookup; rewrite Blade view to auto-reload oninstallation_missinginstead of telling user to close the tab. - Lesson: When two external systems fire concurrent events about the same state, "process inline" + "bounded retry on read" beats "queue async + hope" — and error copy must direct users toward recovery, never away from it.
KD-0596 — Reserved subdomain blocklist not enforced on admin CRUD
- Severity: high
- Symptom: A central operator could create a tenant with a reserved subdomain (e.g.
central.kendo.dev— the central app's own host) via admin paths, bypassing the public signup blocklist. - Root cause: Three admin FormRequests (
StoreTenantRequest,StoreDomainRequest,UpdateDomainRequest) used a weaker regex thanStoreSignupRequestand lackedRule::notIn(Tenant::RESERVED_SUBDOMAINS). Validation logic was duplicated across requests with no shared source of truth, so the drift was invisible. - Fix: Extracted shared
SubdomainRuleValidationRule class; applied to all four FormRequests including signup. - Lesson: Validation logic that exists in more than one place will drift — extract shared rules into reusable
ValidationRuleclasses the moment a second copy is needed.
KD-0591 — Validation errors silently fail on 11 forms
- Severity: medium
- Symptom: Users submitted forms; backend returned 422 with field-level errors; nothing rendered. Forms sat there with no feedback.
- Root cause: 11 templates lacked
<FormError name="…" />next to inputs whose backend rules validated those fields. The response middleware populated the globalerrorBagcorrectly — there was just no liveFormErrorinstance subscribed to render it. - Fix: Inserted 27 missing
<FormError>bindings. Two server-determined fields (laneId,order) intentionally skipped. - Lesson: Hand-authored form-error placement guarantees drift — forms should structurally couple inputs with their error display (a
FormFieldwrapper, or an arch test that cross-references templates against backend rules).
KD-0589 — Duplicate inserts return 500 instead of 422
- Severity: medium
- Symptom: Three Store endpoints (tenant AI key, project AI key, project GitHub repo) returned HTTP 500 when a user submitted a duplicate — typically a double-clicked Save button.
- Root cause: Migrations declared unique indexes but the corresponding FormRequests had no
Rule::unique(...)and the Actions had no pre-check. Duplicates reachedsave(), surfacedSQLSTATE 23000, and Laravel rendered that as 500. - Fix: Added
Rule::uniquewith the migration-matchingwhere()scope to each FormRequest. - Lesson: Every DB-level unique index needs a matching
Rule::unique(or Action-level guard) — the DB invariant is correct, but without a validation surface the user gets 500 instead of a polite 422. An arch test that cross-references migration unique indexes against FormRequest rules would catch this class.
KD-0588 — PasswordConfirmModal silent failure
- Severity: high
- Symptom: Wrong password (or any error) in the password-confirm modal closed the modal silently. User believed the destructive action they were confirming had succeeded.
- Root cause:
handleSubmitorderedemit('close')beforeawait onConfirm. Parent unmounted the modal during the synchronous emit, destroying the inline<FormError name="password" />before the response middleware could populateerrorBag.password. Thecatchblock intentionally swallowed the error, relying on aFormErrorthat no longer existed. - Fix: Reorder so
emit('close')runs only after a successfulawait onConfirm. Distinguish 422 (FormError surfaces inline) from non-422 (dangerToast) in the catch. - Lesson: Never close a modal until the awaited action it gates has resolved — and never rely on a global error bag if the component subscribed to it might already be unmounted.
KD-0587 — Cross-project attachment leak via unscoped attachment_ids
- Severity: medium
- Symptom: Reported as a cross-project leak: a user could attach attachments from another project. Investigation showed the leak didn't actually happen at runtime (Action layer scoped the query) but the FormRequest validation gap was real defense-in-depth.
- Root cause:
SaveIssueRequestvalidatedattachment_ids.*as['integer']only — noRule::exists('attachments', 'id')->where('project_id', $projectId). Inconsistent withlane_id,sprint_id,epic_idetc. on the same request. The arch test that should have caught this matched only the wrong-pattern ('exists:attachments,id') and missed omission entirely. - Fix: Added scoped
Rule::exists. Added'attachments'to the arch-test whitelist. - Lesson: Defense-in-depth scoping must live at the FormRequest layer, not just the Action — and arch tests that detect misuse must also detect omission, otherwise they're a false signal.
KD-0586 — Validation errors don't reach users on 14 forms (camelCase mismatch)
- Severity: medium
- Symptom: Server-side 422s arrived but never displayed on 14 forms. Wrong-password failures, missing-team errors, etc. all silently failed.
- Root cause: The Axios response-error middleware ran
camelCase(key)on every error key before populatingerrorBag. 25<FormError name="snake_case">bindings looked up keys the middleware never populated. Existing camelCase bindings worked; new authors didn't realise the middleware was transforming. - Fix: Renamed all 25 dead bindings to camelCase. Added arch test rejecting any
<FormError name>containing_or-. - Lesson: When a middleware silently transforms data shape, the convention has to be enforced by tooling — naming conventions across template/middleware boundaries are guaranteed to drift without an arch test.
KD-0585 — projects.description column too short for validation rule
- Severity: medium
- Symptom:
POST /api/projectsreturned HTTP 500 for any description longer than 255 chars (4 occurrences in 24h on prod). - Root cause: Migration created
descriptionas$table->string()(VARCHAR(255)). FormRequest later capped atmax:5000. Validation passed, INSERT crashed withSQLSTATE 22001 Data too long. No arch test asserts thatstring|max:Nrules don't exceed the underlying column length. - Fix: Changed column to TEXT.
- Lesson: Validation rule length and column length must be cross-checked by tooling — drift between FormRequest
max:Nand schema length silently turns 422-able input into 500s.
KD-0581 — Billing seat-count mismatch between Kendo UI and Stripe
- Severity: high
- Symptom: Pro tenant displayed "Seats: 2 (€4/seat/month)" but Stripe's invoice was €4 — quantity stuck at 1. Customer silently under-billed.
- Root cause: Two sources of truth with no reconciliation.
BillingController::statuscomputed seat count live fromUser::query()->count(). Stripe's quantity changed only whenSyncSeatQuantityJobfired from three Actions (invite/delete/restore). Any membership change before the sync infrastructure existed left Stripe permanently stale. Both paths also failed silently on edge cases (no tenant context, no Cashier subscription, queue failures). Fix (proposed): Make Stripe the single source of truth — read seat count from$subscription->quantityfor active subscriptions, fall back toUser::query()->count()only without a subscription. Lesson: External systems holding billable state must be the single source of truth — recomputing the same metric in two places (DB count + external API) without a reconciliation pass guarantees drift, and silent no-ops in sync code make the drift invisible.
KD-0574 — TenantAwareQueue captures stale scoped instances, every queued broadcast dropped
- Severity: high
- Symptom: Every realtime broadcast dispatched through queue workers silently dropped in production. Users saw no live updates anywhere until manual reload.
- Root cause:
TenantAwareQueuewas constructed once at boot withTenantSwitcherandTenantContextinjected as readonly properties. Both bindings werescoped. Laravel's queue worker callsforgetScopedInstances()before every job — removing the cached instance from the container'sinstances[]map but leavingTenantAwareQueueholding an orphan reference. TheJobProcessinglistener wrote tenant onto the orphan;broadcastOn()resolved a freshTenantContextwith no tenant set. KD-0556's lazy resolution surfaced the bug; eager constructor injection was the underlying defect. Mocked unit tests passed because Mockery has no notion of container scoping. - Fix: Replaced eager constructor injection with
resolve(...)calls inside every closure registered byregister(). Replaced unit tests (which gave false-green for ~7 weeks) with feature tests using real container bindings. - Lesson: When a service's dependencies are
scoped, that service must NOT cache them in constructor properties — and tests for scoped-binding consumers must use real container bindings, because mocks bypass container scoping and ship false-green.
KD-0556 — PR-merge webhook does not broadcast issue lane change
- Severity: medium
- Symptom: Merging a PR moved the linked issue's lane server-side but no
updatesbroadcast reached connected boards in real time. Manual drag-drop broadcast correctly, so websocket pipeline was healthy. - Root cause: 10 broadcast events derived their channel from
TenantContextsnapshotted in the constructor, andbroadcastOn()returned[]silently when the snapshot was null — no log, no exception. Either tenant context wasn't bound at construction time (background job, console command) or the snapshot drifted across processes. The silent-drop was invisible to monitoring. - Fix: Shared
ResolvesTenantBroadcastChanneltrait that resolvesTenantContextlazily at broadcast time and emits a structurederror-level log when the resolved id is null. - Lesson: Code paths that drop work silently are unobservable bugs — every "return empty / skip / no-op" branch on infrastructure code must log something structured, otherwise the failure mode never surfaces.
KD-0553 — GitHub App self-serve install fails on cache-prefix mismatch
- Severity: high
- Symptom: GitHub App tenant install fails with "Invalid or expired OAuth state" 404 on the first try. Complete feature outage.
- Root cause: Install state lived in cache. The install request ran under tenant context (
IdentifyTenantmutatedcache.prefixtotenant_{id}_); thesetup-callbackran on the base domain withoutIdentifyTenant. Laravel'sCacheManagercaches resolved stores by name — each store readscache.prefixat construction. When PHP-FPM resolved a freshcache.storebetween requests, it read the default prefix and missed the tenant-prefixed key. - Fix: Moved install state to a dedicated
github_app_install_statestable on the central connection. Prefix-immune by construction. - Lesson: State that crosses a tenant-context boundary cannot live in a tenant-prefixed cache — when reads and writes happen under different prefix configs, use a connection-scoped table instead.
KD-0545 — My Issues badge does not update via realtime broadcast
- Severity: medium
- Symptom: The Navbar's My Issues badge and page didn't react to assignment changes, lane crossings into Done, or self-assignment via MCP. Users had to refresh.
- Root cause: Two-part defect. (1) Backend never fired user-scoped issue broadcasts — only project-channel events, which the Navbar can't subscribe to. (2) Frontend
updates/deletedlisteners on the user channel were domain-blind — every payload routed unconditionally tonotificationStore, even thoughUserDomainUpdateEventalready carried adomainfield. - Fix: Added
IssueBroadcaster::myIssuesChanged()for user-scoped fan-out (computingwasOnList/isOnListper affected user). Made Navbar's user-channel listeners domain-aware. - Lesson: Realtime channels must be cut along the same axis as the data they keep in sync — a "My X" view cannot rely on per-project channels, and a multiplexed user channel needs explicit domain dispatch on the frontend or stores can't share it.
KD-0537 — Activity-timeline backfill migration deadlocks production release
- Severity: high
- Symptom: Fly's
release_commandfor prod v180 failed with MySQL deadlock during a tenant-migration backfill. Production stuck on v179; every dev→main merge would keep failing the release. - Root cause: Two interacting problems. (1) Fly's
release_commandruns in an ephemeral machine while v179 app machines keep serving live traffic — the migration's per-rowSELECT ... FOR UPDATE+INSERTfought the liveIssueAuditLoggerfor the next-key lock; InnoDB picked the migration as the deadlock victim. (2) The backfill itself was wrong-shaped — it would have written synthetic "Created today" audit-log entries withnow()timestamps, polluting the append-only hash chain with fabricated history. Staging never hit it because traffic was lower at deploy time. - Fix: Deleted the migration. The activity timeline correctly returns
[]for legacy issues with no audit history; the frontend already had an empty state for that case. - Lesson:
release_commandmigrations run concurrently with the previous version's live writes — any backfill that contends with hot-path writes on the same index will deadlock. And: if the truthful response to "we have no data" is an empty array, don't fabricate data to make it look populated.
KD-0519 — Lane reorder chevrons silently no-op on Project Settings
- Severity: medium
- Symptom: Clicking up/down chevron next to a lane appeared to do nothing — the order didn't change visually. (DB was actually updated; the frontend just didn't refresh.)
- Root cause: Earlier KD-0464 split into two commits. Frontend assumed broadcasts would keep the lane store in sync and removed
laneStore.retrieveAll()fromupdateLaneOrder(). Backend explicitly excluded bulk/cascade lane mutations from broadcasting — and lane reorder happens insideUpdateProjectAction's loop, which is exactly that "bulk/cascade" bucket. Net: write happened, no broadcast, no refetch, store kept staleordervalues. - Fix: Restored the one-line
laneStore.retrieveAll()afterproject.update(). - Lesson: When two commits together replace a refetch with a broadcast, both halves must cover the same code paths — broadcaster scope decisions on the backend must be cross-checked against every refetch the frontend removed.
KD-0518 — Sprint title update wrongly requires status
- Severity: low
- Symptom: Reporter claimed sprint edit modal returned 422 "status field required". Investigation showed real UI usage round-trips
statuscorrectly viamutable.value; only hand-crafted partial-payload clients (curl, MCP, external API) hit the 422. - Root cause: No defect in the real UI flow. The contract is "send the full sprint shape on update" — the adapter-store does that; clients that craft partial payloads correctly fail validation.
- Fix: Kept
statusasrequired. Removed regression tests that had been added during investigation that encoded behaviour contradicting the chosen contract. - Lesson: Reproduce the bug from the actual UI flow before changing the contract — a report describing partial-payload behaviour might be from a hand-crafted client and reflect the contract working as designed.
KD-0514 — "Added you to project" notification fires for existing members
- Severity: medium
- Symptom: Users already on a project got an "added you to project X" notification when a new team containing them was linked.
- Root cause:
UpdateProjectActionbuilt recipients byarray_unique(array_merge($newlyAddedDirectMemberIds, $newTeamMemberIds)). The team-side list was every member of every newly-attached team with no diff against existing project membership.array_uniqueonly deduplicated between the two lists — it didn't subtract users who already had access. - Fix: Subtract
$currentDirectMemberIds∪ members-of-currently-attached-teams from$allNewMemberIdsbefore notifying. - Lesson: Notifications about "newly added" must diff against the prior state — set-union dedup is not the same as diff. When access can be granted via multiple paths (direct + team), every path must be considered when computing "what changed".
KD-0511 — AI validation error leaks into form fields
- Severity: medium
- Symptom: Clicking Generate on the AI story prompt with a short report description showed a 422 error attached to the IssueForm's description textarea — a field the user never edited.
- Root cause: Wire-level field-name collision. The AI endpoint and the IssueForm shared two field names (
description,title). The global error bag was keyed only by Laravel field name with no per-form scoping. A 422 keyeddescriptionfrom the AI endpoint rendered under any<FormError name="description">. - Fix: Renamed AI endpoint payload keys to
sourceDescription/sourceTitleso no<FormError>watches them. - Lesson: A globally-scoped error bag means wire field names are a global namespace — two forms sharing a field name will leak errors across each other. Either scope error bags per form, or use distinct wire-field names for distinct forms.
KD-0510 — Newly created report not auto-selected in detail pane
- Severity: low
- Symptom: Submitting "+ Report" appended the new report to the list but the right-hand detail pane stayed on the placeholder. User had to find and click the new entry.
- Root cause:
handleCreatecalledawait newReport.create()but discarded the return value. The adapter resolved to the persisted Report with its server-assigned id; nothing assigned that id toselectedReportId. - Fix: Capture the returned Report and assign its id to
selectedReportId. - Lesson: Async create flows must thread the persisted entity's id back through the UI — otherwise the verify-and-edit loop is broken into two disconnected steps.
KD-0508 — Modal close button not accessible
- Severity: low
- Symptom: The X close button in modals was invisible to keyboard tools (Vimium) and screen readers.
- Root cause: The X was a bare
<svg>with a@clickhandler. SVG is not in the default tab order, has no implicit ARIA role, and is invisible to browser-level focus management. - Fix: Wrapped the icon in
<button type="button" aria-label="Close">with focus-visible ring. - Lesson: Click handlers belong on semantic elements (
<button>,<a>) — never on bare icons. Accessibility regressions of this class are best caught by a structural arch rule, not visual review.
KD-0500 — Epic name overflow on board cards
- Severity: low
- Symptom: Long epic titles overflowed the colored badge horizontally past the card's right edge into adjacent space.
- Root cause: Three compounding causes. (1) Project doesn't import
@unocss/reset, so elements default tocontent-box—max-width: 100%constrained only content, padding+border were added on top. (2)min-width: autoon inline-block elements withnowrapresolves to full unwrapped text width, beatingmax-width. (3)text-nowrapprevented wrapping but didn't addoverflow:hiddenor ellipsis. - Fix: Combined
box-border+min-w-0+max-w-full+truncateonSimpleBadge. Addedmin-w-0on the parent flex container. - Lesson: Without a global
box-sizing: border-boxreset, every component that uses padding/border with percentagemax-widthis overflow-prone — andmin-width: autoon flex/inline-block items is the silent killer that makesmax-widthconstraints useless.
KD-0496 — Manual reports show 'Unknown' as author
- Severity: low
- Symptom: Manual reports created through the UI showed "Unknown" as author. API-created reports were fine.
- Root cause:
ReportForm.vuedidn't sendauthor_name;CreateReportActionstorednull; frontend templates fell back to literal'Unknown'. The single write site ($report->author_name = $data->authorName) was the bug — every read path correctly read the column, but the column was never populated for manual reports. - Fix: Write-time fallback in the Action:
$data->authorName ?? "{first_name} {last_name}"when a creator is present. - Lesson: When many read paths share a single column, fix at the write site — anything else means hunting through every consumer (HTTP Resource, MCP tools, frontend templates) to patch the same fallback.
KD-0484 — PaginationBar overflows narrow containers
- Severity: low
- Symptom: Pagination bar visibly overflowed the Reports Overview's 400px left column when there were 8+ pages.
- Root cause:
<nav>hadflexwith noflex-wrapand used 11 fixed-widthw-10buttons (~440px intrinsic). Flex items don't shrink below their content's intrinsic width without explicitmin-w-0. Parent hadflex-wrapso siblings could wrap relative to each other, but neither child could wrap internally. - Fix: Added
flex-wrapas last-resort fallback plus a CSS container query that hides the redundant«/»shortcut buttons below 440px (the edge pages remain reachable via the existing first/last page-number buttons). - Lesson: Container queries are the right tool for "compact this UI when its container is narrow" — viewport media queries can't see how big the actual parent column is.
KD-0445 — Ticket-updated toast cluttered, auto-hides
- Severity: low
- Symptom: When user A updated an issue, user B got an
infoToastreading"Alert: <title> updated by <actor> at <ISO timestamp>"that auto-hid after 5s. - Root cause: Three coupled gaps. (1)
Show.vue(issue detail page) didn't subscribe to project-channel issue-update broadcasts like Board/Backlog/Overview did. (2) Backend papered over (1) with a globalPrivateAnnouncementuser-channel toast. (3) The toast variant had no persistence escape-hatch — auto-hid before the user could act. - Fix: Removed the entire
PrivateAnnouncement → 'alerts' → infoToastplumbing. Realtime "your view is stale" UX should live on the page that goes stale, not as a global cross-page toast. - Lesson: When a global notification papers over a missing local realtime subscription, the right fix is to wire the local subscription — not to refine the global notification.
KD-0443 — Board layout glitches on phone in landscape
- Severity: low
- Symptom: On phone in landscape, the sidebar took ~30% of the width and avatars overflowed issue cards.
- Root cause: Two issues. (1) Breakpoint service used
window.innerWidth < 768for mobile detection — phone in landscape often has 800-900px width, crossing the threshold and rendering the desktop sidebar. Viewport width is not a reliable proxy for device type. (2)DragElement.vuehad nooverflow-hiddenconstraint, so children escaped at narrow column widths. - Fix: Added
isTouchDevicevia CSS media query(pointer: coarse) and (hover: none)(correctly identifies phones/tablets without external input devices). Forced collapsed sidebar on touch devices. Addedoverflow-hidden+ truncation to card. - Lesson: For "is this a phone" questions, ask CSS about input modality (
pointer: coarse,hover: none) — never use viewport width as a proxy. Phones in landscape break that proxy.
KD-0406 — Toast notifications hidden behind modal overlay
- Severity: low
- Symptom: Toasts fired while a modal was open rendered beneath the modal's backdrop and were invisible to the user.
- Root cause: Modals use native
<dialog>.showModal(), which adds the dialog to the browser's top layer — a separate rendering stack that paints above every regularz-indexcontext. The toast container was a fixed<div>atz-index: 1050in the regular stacking context. Top layer always wins. - Fix: Upstream in
@script-development/fs-toast@0.2.0— addedpopover="manual"to the container<div>andshowPopover()on mount. Re-enters the top layer on every new toast (last-in-wins ordering). - Lesson: The browser's top-layer is not a higher z-index — it's a parallel rendering stack. Anything that must paint above a native
<dialog>must also live in the top layer (via Popover API), not just have a high z-index.
Recurring themes
Silent failures are the real bug. KD-0556, KD-0588, KD-0511, KD-0586, KD-0591, KD-0581, KD-0604 — every "no error, no toast, nothing rendered" symptom traces to a code path that swallows or drops without logging. Every "return empty / skip / no-op" branch on infrastructure code needs a structured log entry, or the bug is unobservable.
Form-error binding edge cases. KD-0586 (camelCase mismatch), KD-0591 (missing bindings), KD-0511 (cross-form key collision), KD-0588 (modal unmounted before error rendered), KD-0606 (snake/camel rule mismatch). The
<FormError>+ global error bag pattern keeps producing the same shape: any drift between the wire shape, the middleware transform, the rule key, or the template binding silently breaks the user feedback loop. Arch tests catch the structural class; component-level patterns (FormField wrappers, scoped error bags) prevent it.Tenant-scoping leaks via the wrong infrastructure. KD-0553 (cache prefix crossing tenant boundary), KD-0574 (scoped binding captured at boot), KD-0556/KD-0537 (broadcast/migration assumes tenant context). State that crosses tenant boundaries must live somewhere prefix-immune (central connection table, lazy resolution at consumption time) — caching it under a tenant prefix or capturing scoped instances at boot is a guaranteed silent drop.
Validation drift between layers. KD-0585 (
max:5000vs VARCHAR(255)), KD-0589 (unique index vs noRule::unique), KD-0596 (signup blocklist vs admin paths), KD-0587 (scoped exists missing on attachments). Whenever a constraint exists at one layer (DB, migration, central rule) but isn't mirrored at the layer the user hits first (FormRequest), the error surfaces as a 500 or a silent leak. Arch tests that cross-reference layers (column length vs rule max, migration unique vs FormRequest unique, project-owned tables vs scoped exists) close the entire class.Single source of truth, or guaranteed drift. KD-0581 (UI count vs Stripe quantity), KD-0596 (4 copies of subdomain rule), KD-0586 (manual
<FormError>placement vs middleware naming), KD-0510 (server response not threaded to UI state). Anywhere the same value is computed/stored/displayed in two places without a reconciliation pass, drift is a question of when, not if.Broadcast/refetch coverage gaps. KD-0519 (frontend dropped refetch assuming broadcast covered it), KD-0545 (project channel doesn't cover My-X views), KD-0556 (silent broadcast drop), KD-0445 (page didn't subscribe at all). Realtime channels must be cut along the same axis as the views consuming them. A user-list view cannot rely on per-project channels; a per-page view cannot rely on a global toast as compensation for a missing subscription.
Misleading error UX directs users away from recovery. KD-0600 ("close this tab" while the URL silently still worked), KD-0588 (modal closed before error rendered), KD-0606 ("source description required" with no input by that name). Error copy must reference the user's actual recovery path — never tell users to abandon a working state, never reference field names the UI doesn't show.