Two kinds of looking

"Analytics" had been a confused word in EmptyOS. There was an app-analytics app tracking personal usage, and a pressure to add some kind of audience tracker for the published sites. Mixing them in the same app had been tempting — same primitive, same shape — but the two signals mean different things. Which apps am I actually using is an inward question; who is reading the published site is an outward one. Today we split them and then generalised what they share.

The shared piece came out first. A TimeSeriesCounter primitive landed in the SDK — a bucketed SQLite counter with upsert semantics, day-or-hour granularity, and arbitrary dimensions per row. It replaces three hand-rolled counter tables that had accreted in different apps (daily_stats, provider_stats, app_stats), all with slightly different bucketing and slightly different query paths. The new primitive has one way to count things over time: insert with (bucket, key1, key2, …, count+=1), query by window. Small surface, already useful in three places, easily reusable by the next app that needs it.

On top of that, two consumers with different privacy shapes. app-analytics got rewritten to use ui:viewed events from a new pageview middleware in the web server — any GET on an app page emits one, API calls and static assets skip. The rewrite added five new reports: unused apps (the reverse of top apps — which ones never get opened?), errors-vs-usage priority (surface high-traffic endpoints that also error), time-of-day heatmaps, per-app day-hour grids, and weekly streaks. The prompt that summarises this for the user got refactored to a module-top constant with an explicit system= kwarg and a temperature. First-boot backfills 5,000 events so the dashboard isn't blank.

web-analytics is the externally-facing counterpart, and its design is cautious by construction. An inline beacon script served by the Publish app, daily-rolling session hashes (so no persistent identifier), no raw IPs stored, DNT respected. Publish wires it in per-site via a settings toggle — when enabled, the beacon gets injected into {extra_head} during build; when disabled, nothing ships. XSS-safe </<\/ escaping in the beacon itself. An IP exclusion list supports CIDR and has an "Exclude my IP" button so author visits don't self-pollute the metrics. Dashboard shows sparkline + top paths + top referrers + top countries + live tail. It's a thin privacy-first collector; the real visitor tracking for production domains (with Cloudflare Worker + D1) is deferred in a plan doc — this is enough for a self-hosted vault to see who's reading without becoming a tracking pipeline.

The point of the split isn't which app does what; it's that the two signals never mingle in the same database. Personal usage stays in data/apps/app-analytics/, public audience stays in data/apps/web-analytics/. They share the SDK primitive but nothing else. The two-domain rule from CLAUDE.md (human-authored content → vault; high-frequency telemetry → data/) has a new sibling: two telemetry streams with different consent shapes stay in separate domains even when they use the same primitive.

A different thread ran in the same stretch: the live site generator. scripts/generate_emptyos_site.py scans the live manifests (currently ~70 apps, 9 plugins, 800+ endpoints) and regenerates the EmptyOS public site's apps.md (categorised catalogue), plugins.md, capabilities.md (new), injects live counts into index.md between <!-- stats:start/end --> markers, and updates the app count in architecture.md's ASCII art. Idempotent — dry-run mode available. The wrapup skill grew a new Stage 4 that runs the generator, diffs against the vault source, and triggers a local rebuild. Deploy stays manual. The point is fidelity: the public site shows 70 apps because there are 70 apps, not because someone wrote "70 apps" in the markdown.

Alongside it, a drift cleanup on CLAUDE.md. Ten specific edits to remove stale counts, clarify which tables are auto-generated versus hand-maintained, fix a small TOML example, name the reactor event chain explicitly (git:saved → reactor → journal ripple), distinguish runtime tiers from release tiers, and confirm which state files are gitignored. A CLAUDE.md that has drifted quietly over a few weeks stops being an orientation document; it becomes a confidently-wrong one. The cheapest fix was a scan-and-patch session, not a rewrite.

Left for later: real visitor tracking on the public site still depends on setting up an edge-worker pipeline — plan exists internally, not yet wired. The generator catches inventory-level changes but doesn't yet reflect internal refactors — a monolith split inside a single app doesn't change any visible count, so the site doesn't update. Something like "last updated" timestamp or integrity score on the landing page would make substantive sessions visible without requiring inventory shift. We're watching that space.

The shape of the day: two kinds of looking, one shared primitive, and a generator that keeps the public-facing story honest. The system now has enough instruments to notice what it's becoming — which apps are load-bearing, which are ghosts, who's reading what gets published. Instruments aren't judgments; they're what make judgment possible.

Related Posts

← Back to posts