Voice gets a feedback loop

2026-04-26 · 5 min read devlog emptyos voice architecture

For most of its life, Aura was a chatbot with a voice. You could talk to it; it could talk back; it could read context out of the system prompt; it could not actually do anything on your behalf. Asking it to "add a task to call the dentist" produced a polite paragraph about how it would be glad to help but had no way to actually create the task. The orb was an interface to a search engine that pretended to have hands.

The first move was to give apps a way to contribute verbs. A new manifest slot — <span class="wikilink-private">contributes.voice-assistant.intent</span> — lets any app declare a verb in the form <app>.<verb>, an example phrase, an arg schema, and the name of a method on the app that implements it. At boot, Aura discovers every contributed intent across every loaded app. The pattern matches the hub panel and addons slots we already have: apps push capabilities to the assistant, the assistant has zero hardcoded knowledge of which apps contribute what, new app installed means new verbs available with no Aura code change.

The model emits intents inline. When you say "add a task to call mom", the LLM writes a natural reply and embeds a token like [INTENT:task.add({"text":"call mom"})] somewhere in the stream. Aura strips the token before TTS, parses it, dispatches through call_app, and feeds whatever the handler returns back into the same TTS pipeline. The user hears continuous speech with no awareness that a tool fired in the middle of the sentence.

The load-bearing decision in V1 wasn't the parser or the dispatch — it was scope. Throwing every contributed intent into the system prompt would have drowned the model in tools the moment we had more than a dozen apps. So scope is narrow on purpose: Aura sees only intents marked always: true, plus the active companion's app, plus the last two invoked apps, capped at twelve total. The decision keeps the model decisive as the system grows. New apps don't make Aura dumber; they just become available the moment one of their verbs gets invoked.

We shipped the foundation with three reference intents and confirmed it worked on a real device, which is when the bug we couldn't have predicted surfaced. Reading the chat-log transcript from a real session, we found a moment where Aura said "Added: sleep early tonight," and the user followed up with "what are my tasks today" — and the list looked unchanged. The new task had landed correctly; the prioritised top-five just buried it under overdue items. Aura, having no way to know what task.list_today had returned vs. what the user expected, hallucinated an apology: "I dropped the ball — I never fired the tool." It had fired the tool. It just had no way to confirm the consequence.

The reframe we landed on came from looking at the architecture from the right angle. Apps PUSH contributions to Aura — intents, companions, context. Nothing PULLS. Whenever Aura needed to know something only the app could know — disambiguation rules, vocabulary priors, error rendering, the consequence of an action — it was forced to either invent the behaviour or hardcode it in central code. That central-code path is exactly the coupling we built the contribution system to avoid.

So we built a pull-side slot. <span class="wikilink-private">contributes.voice-assistant.narration</span> lets an app register a method that runs after one of its intents fires and appends a follow-up sentence to whatever Aura is about to say. The first adopter is the task app: after task.add fires, the narrator quotes the total open count — "That's two thousand two hundred ninety-nine open total." — which gives the user independent confirmation that the add landed even when the prioritised list doesn't visibly change. Multiple narrators per intent are allowed; their outputs concatenate. Failures are swallowed per narrator and the chat continues. The slot accepts exact verb match (task.add) or namespace match (task.*).

The discipline that matters is what narration isn't allowed to do. It runs after the handler, never instead of it; the handler is still where the work happens. It's allowed cheap state reads only — count, latest-id, "the thing you just did" — not new work. And it has to be short, because it lands straight in TTS. Narration is the consequence summary, not a second bite at the action.

A confirmation gate landed in the same week, for the destructive-intent class of problem we knew would surface eventually. Optional confirm = true on a manifest entry causes dispatch_intent to yield a confirm_required event instead of executing; a separate API runs the action only after explicit user approval. Default off, no behaviour change for existing intents, available the moment any app actually wires a destructive verb.

The shape across the two sessions has a satisfying symmetry to it. Push for what an app can do. Pull for what only the app knows. Aura keeps no hardcoded knowledge of any app. Apps never have to fork Aura to be heard. The intent registry, the narrator registry, and the confirm gate are all the same shape: contribution slots in the manifest, instance methods on the app, fail-soft aggregation in the assistant, and a debug endpoint that lists everything currently in scope so "why didn't my verb fire" stays answerable.

There's a queue of pull-side slots we audited and didn't build. Disambiguation rules, vocabulary priors, context freshness signals, error renderers, TTS formatting hints — about a dozen of them, each one a single-purpose extension of the same pattern. We're holding until the pain shows up. The narration slot earned itself by closing a real bug class observable in real transcripts. The next slot has to clear the same bar.

Aura still doesn't know what your apps do. It just got a way to listen to them after the fact.

Boards discovers what it really is
2026-04-26
One provider was hiding three
2026-04-18
The audit had a blind spot
2026-04-18

← Back to posts

Voice gets a feedback loop

Related Posts