Ready to be run by anyone

2026-04-18 · 8 min read devlog emptyos deployment release sdk

Today was the day the word "release" stopped being aspirational. For weeks the codebase had been getting cleaner, tests had been getting deeper, docs had been filling in. But the final mile — the one that lets a stranger clone, install, and run — was still a dozen small blockers. Today we closed them. What landed wasn't one big feature; it was a dozen things converging toward the same shape.

The largest move was local-first / cloud-deployable. Two parallel research passes (one on industry positioning, one on a direct GPT-4.1 consultation) converged on the same plan: keep the system local by default, build cloud-ready as a first-class option, keep the user in control of the egress surface. What landed is a Config.network_mode abstraction with three modes — local (127.0.0.1, no auth), private (0.0.0.0, no auth — Tailscale/WireGuard/LAN where the network layer is the gate), public (0.0.0.0, auth required). The CLI refuses to boot public mode without an auth token. eos init now prompts for a mode and auto-generates a 32-byte token when you pick public. An auth middleware landed alongside — bearer token + httponly session cookie + a login page + WebSocket auth via query or cookie — and activates only when a token is configured, so the local user workflow is unchanged.

Underneath it, a cloud-consent gate. The kernel got a CloudConsentManager with ask / always / never policies, session-scoped approval caching, async futures for pending requests, and a 120-second default timeout. A host_is_local() classifier handles sixteen cases — localhost, 127.x, ::1, 0.0.0.0, private IPv4 ranges, Tailscale CGNAT (100.64.0.0/10), link-local, *.ts.net, *.local, *.lan. Every provider's is_cloud flag is auto-derived from its host, with explicit overrides where they matter (claude-cli is a local binary but the inference is remote — so it's is_cloud=True). Capability.execute and execute_stream pass cloud calls through the consent gate; denied or timed-out providers are skipped and the capability chain falls through to local alternatives. A Result.is_cloud flag lets UIs badge which calls left the machine. A shared EOS_UI.cloudConsent() modal (provider name, data summary preview, approve/deny/remember) auto-subscribes every page to the cloud:consent_requested WebSocket event. EOS_UI.providerBadge() paints local calls green and cloud calls blue with a hover tooltip. Benchmarks skip unapproved cloud providers silently instead of firing ten concurrent modals. The principle crystallised as CLAUDE.md rules 17–20: no localhost assumptions in app logic, cloud consent is mandatory, no vault data to cloud by default, Docker-bootable.

Docker landed next. A slim Python 3.11 image with ripgrep and git baked in, vault mounted at /vault, config at /app/emptyos.toml, state volume at /app/data. docker-compose.yml with an optional Ollama sidecar. docker-compose.demo.yml with a mandatory Ollama phi3:mini sidecar and a required EOS_NETWORK_AUTH_TOKEN. .dockerignore excludes personal apps, engines, data, caches, and tests. A demo vault came with it — scripts/demo-setup.py builds fifteen notes across a curated set of domains (piano, garden, books, health, relationships) with zero personal content. Demo mode gets a sticky banner on every page linking to the install docs. The point of demo mode isn't to be a trial; it's to be a working tour.

In parallel, the publish app absorbed the portfolio that had been living as its own personal app. A template field on site profiles lets publish render blog, docs, landing, or portfolio layouts from the same engine — case-study cards with frontmatter-driven metadata, 18 interactive charts via Chart.js, category filters, dark/light toggle, URL param filtering. A _portfolio.toml config in the source folder holds hero text, stats, domains, skills, about-sidebar — no hardcoded personal data in the template. binbian-portfolio.web.app deployed through a new Firebase hosting endpoint in the publish UI, which joined the existing GitHub Pages deploy path. The architecture decision worth recording: external service integrations follow app feature → connector app → plugin. Only extract to plugin when multiple apps need self.require(). Noted as a memory.

A batch of SDK extractions fired underneath the visible work — each one small, each one removing duplicate call sites across core apps. today_utc() and days_ago_utc(n) in emptyos/sdk/time_series.py (query-side UTC date helpers for TimeSeriesCounter consumers). BaseApp.setting() (reads from the Settings service with fallback; complements app_config() which reads from emptyos.toml). BaseApp.vault_root (unifies the self.kernel.config.notes_path or Path(".") pattern that was open-coded in ten places across journal, projects, and app-analytics). Each extraction ran through the AST duplicate scanner from the previous week — we can now see structural duplicates across core apps even when the local function names have drifted, and the scanner tells us which ones have enough core callers to justify moving.

The per-app assistant got a real contract. On boot, gpts now syncs the general-assistant.server_actions allowlist from every app's [provides.assistant].commands — no more hand-maintained drift. _method_signature introspects real method signatures via inspect.signature and injects them into the system prompt, so the LLM sees task.list_tasks(overdue_only=False, done=False) instead of an empty-parens placeholder, preventing parameter hallucination. Every successful [DO:] tag appends to an action log; manifests declare inverse fields (task.complete → task.reopen); a new POST /gpts/api/undo reads the last reversible entry and calls the inverse. The UI surfaces an ↩ Undo button whenever any server_results[].reversible=true. Alongside, three layers of tests: Layer 1 static contract (manifest methods exist, inverses exist, no @web_route handlers exposed to [DO:], every page has a description), Layer 2 prompt contract (hits /gpts/api/debug/system-prompt/<agent> and asserts every allowlisted method appears, signatures are present, "don't invent parameter names" guidance present, prompt < 16 KB), Layer 3 live LLM smoke (ten parametrised per-app cases + conversational-question + unknown-app graceful-fallback). First run of Layer 1 caught nine real bugs — seven manifests exposing web routes to [DO:] (silently unusable) and two non-existent method references. The assistant is no longer a wrapper that lies about what it can do.

Release readiness came as a separate pass late in the day. A specific audit assessed packaging, tests, docs, and branding. Thirteen app pages had a missing </script> tag that was injecting HTML into JS context, causing "Unexpected token '<'" on every page load — the closing tag was missing after the EOS.registerActions() block, and the earlier test sweep hadn't caught it because the failures were structural (a syntax error in shipped code) rather than assertive. Music-studio's suno_url field renamed to source_url (the UI copy had already been de-branded; the backing field lagged). Publish's guide page had third-party names converted to generic terms. Both core and standard release tiers now pass package-release.py --check. Full API suite: 479 passed, 12 skipped (personal apps), 1 slow (release check). Safety scanners clean across 351 tracked files.

A quieter but load-bearing fix: the service worker's cache-first strategy had been serving stale eos-components.js even when the server returned the fresh file. Publish was rendering blank because EOS_UI.entityCard is not a function at runtime — the function existed in the 69 KB file the server sent but not in the 49 KB file the cache served. Strategy flipped to network-first for /static/ assets (aligns with the existing NoCacheStaticMiddleware), and CACHE_NAME bumped. This class of bug — the code is correct but a different copy of it is running — is worth documenting: when an unreachable function is reachable from the file you just wrote, suspect the cache, not the code.

One topology pass: four retirements, zero orphans. A scan of call_app edges and event emissions across 75 apps surfaced two orphans (no edges in or out) and six merge candidates. vault-analytics absorbed into app-analytics as a mixin (core platform now has one analytics app with Usage + Vault tabs instead of two nearly-identical apps). The scan that surfaced this came with a warning: merging shouldn't be a reflex — finance + expense stayed separate because expense needs a lightweight quick-log UX that an embedded tab ruins; app-gen + plugin-gen stayed separate because they address different domains; cable + sheath-voltage stayed separate because they have different runtimes. The rule is still events over imports; merging is the option of last resort.

Two meta-shifts worth noting. Self-* capabilities surfaced in public docs — the README and GETTING-STARTED now describe self-healing (provider fallback chains + eos health --fix), self-auditing (13 integrity dimensions + Growth/Root/Connect agents), self-evolving (conversation mode + SDK/UI consolidation skills), self-documenting (eos app info, live topology). Readers can now discover the system is alive without having to dig into DESIGN.md. The claims were deliberately bounded: we didn't claim autonomous self-clean (the vault-cleanup skill is user-invoked, not scheduled). Under-promise, then show the next thing.

And one reflexive loop: a new skill /eos-devlog-publish was written to turn session logs into drafts on this very site, then extended the site generator so the landing page surfaces the live integrity score. The post you're reading is the first output of that skill. It's drafted by default (posts are publish: false so they land in the Publish app's Drafts tab), filters out personal-app content before writing (core/community only, no finance/cable/healing/jobs/et cetera), scans against the .eos-personal pattern file on every draft. The gate between "dev log in the project folder" and "blog post on the site" is now explicit, auditable, and opt-in per post.

Left for later: a public demo deployment (Docker infra is ready; hosting decision isn't). Proper consent-flow tests (behavioural smoke exists; a tests/test_sys_cloud_consent.py doesn't). A BYOK UI for demo-mode visitors pasting their own API key (backend via env override is ready; settings-panel wiring isn't). The backend/UI seam for "your key, session-scoped, never persisted" is the next substantive piece.

The shape of the week, from the other side: a system that spent seven days becoming legible to itself ended the week legible to other people. Consciousness model → integrity rubric → public dev log → release tiers → deployment modes → Docker → demo → consent gate → self-documenting docs. Each step was a small move. The sum is a door.

The audit tools clean themselves up
2026-04-27
Splitting the big apps before they get bigger
2026-04-26
The cron got poisoned because writes weren't validated
2026-04-25

← Back to posts

Ready to be run by anyone

Related Posts