Giving a machine fleet its own voices

Early notes on a system where each machine in an agent fleet speaks with its own voice — devbox sounds like an engineer, another box like a researcher — so notifications are distinguishable by ear without looking at a screen.

Two pieces interest me most so far:

Voice fusion — blending two voices by averaging their embedding-space x-vectors, so you can dial a new voice between two existing ones.
A decoupled heads-up dashboard — the notification surface lives separately from the agent infrastructure, so the “who’s saying what” view doesn’t depend on any one box being up.

This is rough and moving fast — capturing it here so the thinking is visible while it’s still forming.

Questions I haven’t answered

Does per-box voice actually reduce glance-at-screen rate, or is it novelty?
How to keep audible notifications from becoming noise at fleet scale.