Short demo video: https://streamable.com/ltzb3n
The world is written in UdonSharp and the source code is available at https://github.com/hiinaspace/utheremin
Briefly, it works by having an AudioSource with a dynamically created AudioClip that plays and loops constantly. Each Update(), I synthesize a waveform into a small (256 float) buffer and load it into AudioClip.SetData, with an offset slightly ahead of the current AudioSource.timeSamples, thus achieving realtime synthesis. The frequency varies based on a weighted average of the active player’s hand and finger bones, so that index fingertracking or gestures can fine-tune the pitch to some degree like a real theremin. The hand tracking gameObjects track their owner’s skeleton and use Udon’s “Sync Position”, so playback generally stays synchronized to other listeners.
Unfortunately, udon is very slow, so at most you can generate about 300 samples per frame, which limits the sampling rate of the audio to about 6kHz (for realtime synthesis); thus any complex waveforms like squares or saws have bad aliasing artifacts above 3kHz. Plus, since the writeAhead isn’t quite synced to the playback, there are occasionally snapping/popping artifacts if you drop frames locally and you get a buffer underrun.