Internal · SpaceMusic Engineering

UI Rendering and its Fundamental Challenges

Why SpaceMusic's UI architecture is a deliberate trade-off, what we've learned from the last weeks of optimization, and the direction we believe is right going forward.

Why this is unusual

SpaceMusic is two products in one box: a real-time 3D performance engine, and a parameter-dense desktop application. Most software is one or the other. Games render 3D and have minimal UI; productivity apps have rich UI and almost no 3D. SpaceMusic needs both — the spatial output and the editor experience around it.

When a single product has to be both, the team has to decide where to put the bridge between them. That decision is the architecture. The four patterns the industry uses are well-known; the fifth we ended up with is rare and accounts for most of the integration pain we keep running into.

How other applications solve it

  1. UI built on the same renderer. The team writes their own UI toolkit on top of their renderer. Perfect integration; takes years of investment. Unreal Editor (Slate), Unity Editor, Blender, Cinema 4D, Houdini, Maya, After Effects
  2. Embedded Chromium / web UI. The UI is a webpage; the engine output is a canvas element or composited GPU texture. Reuses the entire web ecosystem; adds 200+ MB and noticeable input latency. Spotify, Figma desktop, Discord, Steam overlay
  3. Native OS UI with embedded viewport. Standard Windows / Qt / Cocoa UI wraps the app and the 3D engine renders inside a hosted control. Works when the UI is around the 3D, not over it. Rhino, AutoCAD, SolidWorks
  4. Single-framework apps. One graphics framework draws everything. Works when 3D needs are modest. Not an option for us — we need Stride's full pipeline. Google Earth (Flutter), Flutter desktop apps in general
  5. Our hybrid (rare). A native UI framework (Avalonia, via Skia) composited over the 3D engine (Stride), with a transparent hole punched through the UI so the 3D scene shows through. Every interop boundary — input routing, dirty tracking, frame pacing, texture synchronization — becomes hand-written code we own.

Where we are today

The current architecture works, and we've optimized it hard over the past weeks. Idle UI performance went from 67 fps to 280 fps. Layout cost dropped from ~15 ms per frame to under 0.5 ms. We now only re-render the UI when something actually changed. For most use cases the experience is excellent.

But a class of bugs proved structural rather than tunable. A texture flicker on hover that we chased for two days turned out to be a GPU race between Avalonia and Stride sharing one device — we ruled out half a dozen plausible causes before accepting it. At fullscreen 4K, the CPU cost of compositing the UI scales with pixel area and starts to dominate the frame budget. These aren't profiler problems; they're consequences of where we put the bridge.

A working prototype (V2, on a branch) renders Avalonia to a CPU bitmap instead of a shared GPU surface. The flicker disappeared by construction — no shared GPU device means no GPU race possible. Idle perf actually got faster than the original. The remaining cost is the CPU rasterizer at high resolution, which is what the next step addresses.

The architectural insight

The optimization work crystallized a bigger framing: the UI doesn't need to be part of the engine.

Our ABB installation runs SpaceMusic essentially headless — automated scene changes throughout the day, no human at a UI. It's been doing that flawlessly for two years. That's proof the engine already runs autonomously; the UI is a separate concern that's optional.

This is the client-server model. The engine produces state (channels) and visual output (3D rendering and plugin textures). The UI is one of many possible clients that consume that state and output. We've already done the hard part of separating these: state lives in channels and the data model, not inside UI code. Channels are now the universal API between the core and any UI — they work locally, they work cross-thread, and we've already proven they work over WebSocket for remote.

This framing reveals four UI deployment scenarios. All are eventual targets; they share the same core and the same API.

Figure 1 · The client-server model

GPU SHARE GPU SHARE VIDEO STREAM VIDEO STREAM CORE SpaceMusic Engine 3D · audio · plugins · state API Public Channels data model · thread-safe WebSocket transport ready LOCAL Single-Screen UI overlay on output LOCAL Dual-Screen UI editor + showcase REMOTE Remote Computer future · over network REMOTE Web Browser future · WebSocket + video LEGEND Universal API Core Local client Remote client (future) State write Subscribe / publish

In all four scenarios the engine does the same thing: writes channels, produces textures. What changes is how those reach the UI. Locally it's shared memory and direct GPU access. Remotely it's a network protocol and a video stream. The channel layer doesn't change at all — it already runs over WebSocket.

The local case: take the UI out of the engine's render loop

For local rendering, the immediate next step is to take the cost of Avalonia composition off the engine's main render loop. Today, when a user drags a slider, the loop spends roughly 50 ms re-rendering the UI at 4K — and because vvvv's render is single-threaded by design, that cost blocks the engine's 3D render too. Dual-screen perf doesn't escape this; both screens share the same loop.

The change: Avalonia composition moves to a dedicated worker thread. It reads channels, runs layout, renders to a bitmap, publishes the latest snapshot. The main render loop just blits that snapshot and draws texture overlays on top — both cheap GPU operations. The engine never waits for the UI.

Figure 2 · Local threading model — engine and UI on independent cadences

MAIN THREAD · RENDER LOOP UI WORKER THREAD SNAPSHOT ENGINE Engine + Audio runs unchanged RENDER 3D Scene Render full Stride rate BLIT Blit UI Snapshot cheap · GPU only DRAW Texture Overlays previews · video SUB Read Channels parameters · state RENDER Compose Avalonia CPU · own cadence PUBLISH Publish Snapshot atomic swap Public Channels shared · thread-safe GPU Textures single Stride device Focal work Standard step Shared resource Snapshot handoff Read / write

This delivers two outcomes at once. The engine's frame loop becomes consistently fast regardless of UI activity — matching the headless performance characteristic, with UI on. And the UI runs at its own cadence (target 30 Hz, plenty for parameter editing) without dragging the engine down with it.

The previews — 2D and 3D plugin outputs, video sources — continue to work exactly as today. The UI captures where each preview should appear; the main render loop reads the texture and draws it in place during its blit pass. Because the engine and the UI are in the same process and share the same Stride GraphicsDevice, every texture is directly accessible. No cross-process protocol, no shared GPU handles, no encoding cost.

The remote scenarios (later)

Once the UI is genuinely a subscriber thread rather than part of the engine's render loop, the remote scenarios become transport problems rather than architectural ones. The channel layer already runs over WebSocket. The hard part for remote is the preview pane — the GPU texture passthrough we get for free locally becomes a streaming problem.

These are future work, scoped, and additive — they build on the same architecture without changing it.

Why this matters

"The engine's quality is the product. The UI should feel free, not borrowed."

Three reasons this conversation matters now:

1. Bugs that look like tuning problems aren't. The flicker we couldn't fix in V1 disappeared in V2 because the architecture changed. The 4K perf wall isn't a profiler problem; it's a consequence of where rendering happens. Until the architecture moves, certain bugs will keep coming back.

2. The roadmap implies separation. Remote control, web UI, possibly multi-user — none of those are possible without treating the UI as a client. Building those on top of a tightly-integrated UI would mean re-doing the architecture later. Doing the separation now means each future scenario is a transport question, not a refactor.

3. The engine's quality is the product. The headless ABB installation has proven the engine can perform indefinitely. The UI shouldn't compromise that. The goal of the next phase is to make "UI on" perform like "UI off" — not because we don't value the UI, but because we want the UI to feel free, not borrowed.

Validated

The hypothesis is right

V2 prototype confirms moving Avalonia rendering off the shared GPU surface fixes the flicker class of bugs by construction. Idle performance also went up.

Next step

Composition on a worker thread

One new ProcessNode, one dedicated thread, well-defined snapshot handoff. Engine never waits for UI. Bounded refactor, builds on V2.

Future

Remote and web UIs

Channels already work over WebSocket. The remaining piece is a texture/video transport. Additive work on the same architecture.