Splitting the big apps before they get bigger
The task app's app.py was nine hundred and eighty-two lines. Canvas was eight hundred and seventy. Neither felt unreadable yet — both still had clear sections, both still passed every test — but both were getting close to the edge where new contributors would have to scroll-and-search rather than read top-to-bottom, and where adding a feature meant first remembering which of three internal helpers had a name like the one you wanted. We've watched apps in other projects cross that line and never come back. The cost of splitting goes up the longer you wait, because by the time the file is genuinely confusing, the seams aren't where you'd put them now — they're wherever past-you happened to leave a function break.
So this session was a deliberate pre-emption. Task split into an orchestrator (still app.py, four hundred and seventy lines) plus three pure modules: indexer.py for vault scanning and caching, mutations.py for the markdown checkbox transforms, queries.py for aggregations like today / this-week / overdue. Canvas split similarly: orchestrator plus storage.py for the board file codec, prompts.py for the LLM system prompts, layout.py for node placement geometry. The public API didn't change. Routes stay where they were. Other apps calling task or canvas see the same surface. Forty-six system tests pass with no edits.
The split rule we kept was the one that's easy to say and harder to hold to: pure functions belong in their own module. Anything that doesn't touch the kernel, doesn't read configuration, doesn't await anything — checkbox parsing, layout math, prompt strings, codec round-trips — leaves the orchestrator. What stays in app.py is the part that's actually orchestration: route handlers, event emissions, capability calls, the bits that depend on self. The win isn't just lines of code per file; it's that the pure modules are now testable at the REPL with a one-line import, no daemon, no fixtures, no mocks. We re-verified canvas's storage codec end-to-end by handing it a board dict, round-tripping it through serialize-and-parse, and asserting nodes/edges/vault-paths/passthrough frontmatter all came back identical. That's a thirty-second test that used to require booting the system.
A small thing surfaced as a side effect: long-running jobs across the project have been growing their own ad-hoc sticky-progress UI, each app reinventing the same bottom-right card with a cancel button. Music studio had one. Podcast had a different one. Several others were planning to. We extracted a single EOS_UI.jobProgress({id, onCancel}) helper into the shared frontend, with a multi-id stacking story so two long jobs from different apps don't fight for the same screen corner. Music studio's MV render migrated first; the rest will land as the apps get touched. About twenty-five lines of bespoke code per app deleted, replaced with a one-liner that emits a known event shape.
The harder rule — and the one we explicitly didn't break — is extract to the SDK only when the second consumer needs it. Task's mutations.py is a tempting candidate; journal and projects both do similar - [ ] line work. We flagged it and held back. The shape of mutations is very specifically what task needs: same-document checkbox toggles, idempotent re-renders, line-anchored ids. Journal's checkbox usage is structurally different — entries are append-mostly, line numbers are unstable, the surrounding markdown is journaling prose rather than a task list. Projects' is closer, but still has its own conventions around section headers. Lifting today would either generalize past usefulness or paint two callers into a corner the third one can't follow them into. So mutations stays in the task app, with a comment noting it's the prime candidate when the third caller arrives.
There's also a small economy to splitting that's worth naming. Each pure module is now its own unit of cognitive load. When something breaks in the kanban grouping, you go to queries.py. When the checkbox toggle is misbehaving, you go to mutations.py. The orchestrator-plus-pure-modules shape means the question "where would I look for this" has a first-pass answer before you grep. That's the real return — not the line count, not the test isolation, but that the shape of the file system tells you something about the shape of the system.
There's a third candidate sitting at eight hundred and fifty-nine lines, and it's the most coupled of the three: voice-assistant, with its dispatcher plus intent registry plus scope rules plus companion swap all weaving through each other. We deliberately deferred it. A naive split would saw through live wires; it needs its own session, its own shape, and probably a different decomposition than the storage-prompts-layout pattern that worked here. The lesson from this session is that the right time to split a big file is before you have to. The lesson queued up for next time is that you don't get the same answer twice.