Engineering Lessons Learned

Cross-cutting patterns that recur across many plans and bugs. The decisions files capture individual forks; this file captures the meta-rules that explain why those forks went the way they did.

If you find yourself debating an architectural choice, this is the file to read first — chances are the debate has already happened in 5 other domains and the answer is here.

1. Consistency over local optima

The single most cited heuristic across 124 DECISIONS files. When a "better" technical option conflicts with what surrounding code already does, the codebase's existing pattern wins almost every time.

Examples where consistency beat the textbook:

Auto-increment IDs over ULIDs (KD-0315)
Int-backed enums over string-backed (KD-0341, KD-0456)
extends CustomException over native \RuntimeException (KD-0579)
invite_ttl in config/auth.php over a model constant (KD-0563)
Client-side pagination over server-side LengthAwarePaginator (KD-0398)

Why it works: A solo dev / small team can't afford the cognitive overhead of multiple ways to do the same thing. The shipped pattern is rarely the most elegant, but it's the one that's already tested, debugged, and understood.

When to break the rule: A clear architectural defect that's blocking progress — not aesthetic preference.

2. Reuse existing precedent before building new

Most "key decisions" turn out to be "find the closest precedent in the codebase and mirror it." When this fails, it's usually because the dev didn't search hard enough.

Recurring instances:

KD-0157 reuses Report API proxy
KD-0186 reuses Passport for webhook auth
KD-0280 reuses issue endpoints for AI on reports
KD-0349 reuses store pattern for attachment delivery
KD-0419 mirrors measurement.ts for auto-fill state
KD-0436 mirrors AttachmentController::download for avatar streaming
KD-0442 mirrors DeleteLaneAction for query-builder bulk update

Practical rule: Before designing a new pattern, ask "where else in the codebase does something structurally similar?" If you can't find one in 10 minutes, the pattern is genuinely new — but it usually exists.

3. Deptrac and arch tests are real design constraints, not just CI annoyances

Many "decisions" are actually layer-rule satisfactions in disguise. When you see a weird-looking abstraction, it's usually because Deptrac forbade the obvious one.

Examples:

Actions → Resources is forbidden → spawned the entire Broadcaster class hierarchy (KD-0470, KD-0463, KD-0464, KD-0467, KD-0405). Models would have been simpler, but they can't reach Resources from Actions.
Helpers → Policies and Policies → Helpers both forbidden → user-visibility logic landed as an Eloquent scope on the User model (KD-0501).
MCP attachment URLs went to Resources not Mcp because the Mcp layer can't depend on URL composition (KD-0368).
KD-0436 split: model returns relative path, Resource composes absolute URL — only path that satisfies "no facades in models" + "no Helpers from Resources" simultaneously.

Practical rule: When a decision feels awkward, check tests/Arch/ and deptrac.yaml. The answer to "why didn't we just do X?" is usually "the layer rules wouldn't let us."

4. YAGNI is enforced aggressively

Three large refactor proposals were rejected mid-plan for solving hypothetical needs:

KD-0361 D9 killed scopeQuery + canManageTeams for a Team Manager role that doesn't exist.
KD-0489 D1 scrapped an entire scope_type enum schema (KD-0490 was deleted) in favor of one validation rule.
KD-0195 D4 rejected a factory-injected mention extension because the central app has no consumers.

Hardcoded constants are also kept hardcoded with explicit "configurable later if needed" notes — and the "later" rarely arrives:

10 attachments per entity (KD-0183)
24h orphan TTL (KD-0360)
180-day notification retention (KD-0294)
20 MB feedback files (KD-0157)
6 invite TTL days (KD-0563)

Practical rule: If you can't name two real users who need the configurability today, it's premature.

5. Plans get reversed during implementation — that's not failure, that's the system working

Plans are not contracts; they're hypotheses. 20+ documented mid-plan reversals turned out to be the difference between shipping something correct and shipping something broken.

Common reversal patterns:

D2→D8 in KD-0578 (shared → tenant after app-boundaries.spec.ts flagged it)
D5→D7 in KD-0394 (orphan-forever → re-parent on save after discovering existing cleanup)
D4 in KD-0362 reversed by reviewers (project-rule violation: no JSON columns)
D14 in KD-0362 reversed (in-app columns dropped — pivot is the gate)
D17 in KD-0362 reversed (cutover banner dropped pre-merge — disproportionate to value)
D4/D3 in KD-0548, KD-0486 reversed during implementation tasks
D17 in KD-0341 (OAuth) reversed for codebase consistency
D2/D3→D8 in KD-0527 (time-entry overview broadcasts dropped in review — "a page that mutates while you read it is a UX hazard")
KD-0528 (attachment broadcaster: async transcode reversed to synchronous mid-plan)
D1→D4 self-reversal in KD-0641 (drop a subclass EAGER_LOAD constant equal to the inherited default)
KD-0483 reverses KD-0413's "no helper trait" stance once arch rules could enforce the hardened idiom

Common failure mode in pre-reversal plans:

Over-trusting "@shared has no app deps" — KD-0578, KD-0394 both made this mistake.
Over-trusting "no orphan cleanup exists" — KD-0394 D5.
Stale knowledge of arch rules — proposing JSON columns when project rules forbid them (KD-0362 D4).

Practical rule: Plan-reviewer + arch tests + app-boundaries.spec.ts are the safety net. Don't skip them. The DECISIONS.md format captures these reversals in a way commits don't — and the reversals are usually where the real lesson lives.

6. The "self-sufficient broadcast payload" rule

Established by KD-0461, codified as an arch test in KD-0465, then referenced as binding in KD-0405, KD-0463, KD-0464, KD-0467, KD-0474, KD-0548. Almost every WebSocket decision gets evaluated against "does this force a refetch?"

The rule: Broadcast events must carry enough data for clients to apply the change without making a follow-up HTTP request.

Tension with the 10 KB Reverb cap: Self-sufficiency would inflate payloads, but the cap forces slimming. Resolved by domain-specific slim Resources:

IssuePositionResourceData (KD-0452)
IssueListResourceData (KD-0548)
thin-payload broadcasts for bulk-assign (KD-0549)

Practical rule: When designing a broadcast, sketch what the listener does with it. If "refetch" appears anywhere in the answer, the payload is wrong.

7. "Fix it in the shared component, not at every call site"

A surprisingly common bug class: a shared primitive has a weak default or subtle bug, and consumers patch it locally one-by-one until someone notices the pattern.

Documented in:

KD-0163 (MultiSelect focus)
KD-0274 (SelectContainer max-width)
KD-0402 (TableBody zebra)
KD-0418 (SelectSearchInput placeholder)
KD-0438 (TextAreaField default rows)
KD-0521 (RichTextArea null coercion)
KD-0721 (44px touch target clamped in Button.vue, not 38 call sites)
KD-0737 (double-submit guarded in BaseFormModal, not per form)

Practical rule: When you find yourself patching a third call site for the same issue, stop. Fix the primitive. Let consumers inherit. Don't deprecate-then-migrate — migrate first, deprecate second.

8. `input() !== null` is the canonical nullable-scalar guard

Started as a single-rule for int (KD-0413), expanded to all scalars (KD-0477), reinforced by KD-0521 fixing a regression. has() is now PHPStan-banned.

Why: Laravel's has() returns true for empty-string posts (?foo=), while input() !== null treats empty string as null. The empty-string-as-null discrepancy causes silent validation failures.

Practical rule: In FormRequests with optional scalars, use $this->input('foo') !== null as the conditional, never $this->has('foo').

9. Frontend filters, backend stays permissive

When a UX problem looks like "too much shown," the answer is rarely backend scoping.

Examples:

KD-0376 (sprint selector — frontend filter, backend keeps all sprints reachable)
KD-0385 (epic-form filter)
KD-0361 D1 (Teams Overview — gap was UI-only)
KD-0386 D1 (issue Own-scope — gap was UI-only)

Why: Backend changes affect MCP, CLI, REST, and tests; frontend changes are localized. Power-users on CLI/MCP often need the unfiltered view.

Practical rule: Audit the backend before assuming it needs scoping. Most "missing filter" complaints have a UI-only fix.

10. Single source of truth, or guaranteed drift

Across postmortems and decisions, the same failure pattern repeats: state in two places, no sync mechanism, eventual divergence.

Postmortem cases:

UI seat count vs Stripe (KD-0581)
4 copies of subdomain blocklist (KD-0596)
Manual FormError placement vs global error bag (KD-0586, KD-0588, KD-0591)
Server response not threaded to UI state (KD-0510, KD-0605)
Validation drift between DB / FormRequest / migration / admin / signup
Hand-maintained lists that go stale when a sibling is added: cascade-delete FK table list (KD-0738), multi-select drag path (KD-0807), broadcaster registration (KD-0734) — and the distill source-tracking that re-distilled an already-processed plan (KD-0640 / RETRO-012)

Decision cases:

KD-0292 (FindOrphanedAssignmentsAction extracted because two implementations would drift)
KD-0292 (one preview endpoint, frontend decides display — instead of two endpoints)
KD-0456 (no stored status column — derivable from existing nullable timestamps)

Practical rule: When you spot the same logic in two places, extract before the next change. When two layers must agree on a value, find the cross-layer test that would catch divergence — and write it.

11. Silent failures are the real bug class

Across ~7 postmortems, the symptom was "nothing happened" because a code path returned/swallowed/dropped without logging.

Documented:

KD-0556 (webhook broadcast silently dropped)
KD-0588 (password confirm silently failed)
KD-0606 (AI source description case mismatch silently broke)
KD-0518 (sprint status sometimes silently required)
KD-0511 (AI validation error silently leaked)
KD-0445 (toast cleanup silently missed)
KD-0691 (match with default => null swallowed an unhandled status)
KD-0631 (<Suspense> blanked the page with no error slot)
KD-0753 (link-branch error swallowed instead of rethrown)

Practical rule: Every catch-and-continue path needs to log. Every "best-effort" optional step needs to log on skip. If a code path can decide to do nothing, it must say so.

12. Tenant-scoping leaks via the wrong infrastructure

Multi-tenancy isolation breaks at the seams between request scope and infrastructure scope.

Documented:

KD-0553 (cache prefix crossed tenant boundaries)
KD-0574 (queue bindings captured at boot, not per-job tenant)
KD-0556 (broadcast assumed tenant context)
KD-0537 (audit backfill deadlocked because of missing tenant scope)
KD-0587 (cross-project attachment leak)
KD-0783 / KD-0785 / KD-0786 / KD-0787 / KD-0788 (the central-connection binding crash class — an unqualified ConnectionInterface resolved to the tenant DB while audit models hardcode central, so transaction invariants checked the wrong connection and threw, but only in prod since tests collapse the connections into one instance)
KD-0808 (rank compactor kept synchronous because a queued job loses tenant context)

Practical rule: When something is "configured at boot" — cache prefix, queue binding, broadcaster connection, mailer transport, the default DB connection — assume it's wrong for multi-tenant. Look for a per-request equivalent or a tenant-iterating pattern. The established fix for the connection case (KD-0783 ff.): extract the second-connection work into a sibling Action, bind the connection contextually, and gate it with an arch test — don't let an unqualified ConnectionInterface decide for you.

13. Form-error binding is its own bug surface

Five postmortems traced to the <FormError> + global error bag pattern: camelCase mismatch (KD-0586, KD-0606), missing bindings (KD-0591), cross-form key collision, modal unmount race (KD-0508), snake/camel rule mismatch.

Practical rule: When adding a new form, also add a test that submits invalid input and asserts the error renders next to the right field. The default state — "global error bag has the message but it's not visible anywhere" — is the silent failure mode of choice for this codebase.

14. Defer edge cases, file follow-ups, ship the happy path

Recurring scope-control posture across plans:

KD-0491 D8 (GitHub App install — defer non-happy-path)
KD-0464 (sprint-completion / epic-reorder broadcasts deferred)
KD-0549 D9 (Board UI gap accepted)
KD-0453 D7 (PDF deferred)
KD-0265 (multi-report merge deferred but data model permits it)

Practical rule: Plans that try to ship every edge case in v1 are the ones that don't ship. Identify the happy path, ship it, file a Kendo issue for each deferred edge case during the same PR.

15. Cross-subdomain session bridging is a hard "no"

KD-0341 D3 (OAuth callback) and KD-0569 D5 (tenant verification) both rejected Guard::login on a different subdomain. Cookie-domain semantics (SameSite, secure flag, parent-domain inheritance) burn too easily.

Practical rule: When a flow needs to traverse subdomains, use a redirect-with-prefilled-email or a one-time exchange token. Never assume the session cookie will carry across.

16. Action layer absorbs everything that's not a Service

Deptrac forbids Services → Actions and Services → Audit. So orchestration logic — the kind that runs business logic + ceremony (audit, notifications, cache invalidation) — has nowhere to live except Actions.

Practical rule: "Action = anything that runs business + ceremony logic" is the working definition in this codebase. Don't try to split orchestration into a Service; the layer rules will reject it.

17. MCP error discipline migrated mid-history

Older MCP tools rethrow Throwable, leaking stack traces. Newer ones (KD-0481, KD-0550, KD-0551, KD-0457) explicitly diverge from this pattern, citing backend/CLAUDE.md. Four older tools are tech debt.

Practical rule: When touching an MCP tool, check whether it follows the new error-handling convention. If not, the migration is part of your task.

18. Keep "scope" explicit when extending shared APIs

When migrating a shared API (rename, drop prop, change signature), don't add deprecation aliases. They become permanent noise.

KD-0618 (drop SimpleBadge content prop) and KD-0619 (rename InputLabel label → forId) both explicitly rejected aliases. The atomic refactor used vue-tsc as the safety net.

Practical rule: If your refactor is mechanically correct, the type checker is the proof. Aliases that "let consumers migrate at their own pace" are debt.

This batch reinforced it: KD-0515 swapped a multi-consumer filter bar by renaming the old one *Legacy and migrating callers in one pass; KD-0780 retired the is_primary branch flag outright (with the arch test downgraded L1→L0) rather than leaving a compatibility shim.

19. Test fixtures must mirror what the real system emits

Two Hand-to-Claude bugs shipped green because the test helper synthesized a payload shape that the parser also assumed — so the fixture and the code agreed with each other and disagreed with production. Production was the first witness.

Documented:

KD-0699 (PR-evidence parser rejected the real MCP tool-result shape; the fixture built a different one)
KD-0700 (grader verdict-source fixture built an API shape that doesn't occur in practice)

Practical rule: A fixture builder is a second implementation of the contract, and it can drift from the real one. When a parser or validator has a test, its input must come from — or be checked against — a real captured payload, not a hand-written guess. If the only witness to the real shape is production, you don't have a test.

20. Structural "known-gaps" heuristics over-detect — verify site-by-site, not by count

When an arch test or audit flags N call sites as "needs fixing," that count is an upper bound, not a work-list. KD-0788's central-binding heuristic flagged 14 Actions; only 5 were real gaps. Trusting the count would have meant 9 needless changes — and 9 chances to introduce a regression.

Practical rule: A flagged list from a structural heuristic is a list of candidates. Read each site before changing it, and report what you dropped and why. A silent "fixed all 14" hides the 9 that didn't need it.

21. Make the unsafe state unreachable by omission

The recurring failure is "forgetting the field is silently allowed, and the silent default is the dangerous one." The fix is to make the safe choice required, so forgetting becomes a compile/arch error instead of a security hole or a duplicate write.

Documented:

KD-0640 (route meta: permissions made required with [] as the explicit "no check" state — "silent defaults re-create the bug they're meant to prevent")
KD-0737 (submit-guard moved into BaseFormModal + an L1 arch test, so a hand-rolled form can't ship without the double-submit guard)

Practical rule: If "I forgot to add it" produces an unsafe-but-working result, the API shape is wrong. Prefer required fields, empty-but-explicit states, and arch tests that fail on omission over optional fields with a convenient default.

Anti-patterns (rejected approaches)

Static-through-instance calls ($this->model::where())
Raw arrays across HTTP/domain boundary (DTOs only)
Mutable DTOs (always final readonly)
JSON columns for relational data (KD-0362 D4)
String-name matching across projects (KD-0362 D4)
Guard::login across subdomains (KD-0341, KD-0569)
Helper class re-exports (KD-0436)
Custom HMAC when Passport works (KD-0186)
<title> attribute for tooltips (KD-0403)
has() for nullable scalars (KD-0413, KD-0477, KD-0521)
->toOthers() for user-scoped broadcasts (KD-0405)
Deprecation aliases on shared API renames (KD-0618, KD-0619)
Confirmation modals for hard-delete-everything actions where undo isn't possible (KD-0294)
Fabricated test fixtures that no real system emits (KD-0699, KD-0700)
Unsafe default-by-omission — optional permission? silently meaning "anyone logged in" (KD-0640)
match with default => null that swallows unhandled cases (KD-0691)
Unqualified ConnectionInterface in multi-connection code (KD-0783 ff.)

Engineering Lessons Learned ​

1. Consistency over local optima ​

2. Reuse existing precedent before building new ​

3. Deptrac and arch tests are real design constraints, not just CI annoyances ​

4. YAGNI is enforced aggressively ​

5. Plans get reversed during implementation — that's not failure, that's the system working ​

6. The "self-sufficient broadcast payload" rule ​

7. "Fix it in the shared component, not at every call site" ​

8. input() !== null is the canonical nullable-scalar guard ​

9. Frontend filters, backend stays permissive ​

10. Single source of truth, or guaranteed drift ​

11. Silent failures are the real bug class ​

12. Tenant-scoping leaks via the wrong infrastructure ​

13. Form-error binding is its own bug surface ​

14. Defer edge cases, file follow-ups, ship the happy path ​

15. Cross-subdomain session bridging is a hard "no" ​

16. Action layer absorbs everything that's not a Service ​

17. MCP error discipline migrated mid-history ​

18. Keep "scope" explicit when extending shared APIs ​

19. Test fixtures must mirror what the real system emits ​

20. Structural "known-gaps" heuristics over-detect — verify site-by-site, not by count ​

21. Make the unsafe state unreachable by omission ​

Anti-patterns (rejected approaches) ​

Engineering Lessons Learned

1. Consistency over local optima

2. Reuse existing precedent before building new

3. Deptrac and arch tests are real design constraints, not just CI annoyances

4. YAGNI is enforced aggressively

5. Plans get reversed during implementation — that's not failure, that's the system working

6. The "self-sufficient broadcast payload" rule

7. "Fix it in the shared component, not at every call site"

8. `input() !== null` is the canonical nullable-scalar guard

9. Frontend filters, backend stays permissive

10. Single source of truth, or guaranteed drift

11. Silent failures are the real bug class

12. Tenant-scoping leaks via the wrong infrastructure

13. Form-error binding is its own bug surface

14. Defer edge cases, file follow-ups, ship the happy path

15. Cross-subdomain session bridging is a hard "no"

16. Action layer absorbs everything that's not a Service

17. MCP error discipline migrated mid-history

18. Keep "scope" explicit when extending shared APIs

19. Test fixtures must mirror what the real system emits

20. Structural "known-gaps" heuristics over-detect — verify site-by-site, not by count

21. Make the unsafe state unreachable by omission

Anti-patterns (rejected approaches)