Expose Local Audio Output Latency to Udon (to enable client-side A/V sync)
しんーーご
[ SUMMARY ]
・Please expose the local audio output latency (time from audio scheduled by the engine to sound at the user’s speakers/headphones) to Udon.
・With this value, world creators can apply a per-client offset to video playback so audio and video stay in sync on each device.
[ PROBLEM ]
・Audio output latency varies widely by user (hardware, OS/driver, platform, buffer sizes).
・Udon currently cannot read that latency, so creators rely on manual sliders or fragile periodic seek/pause nudges.
・Result: noticeable desync during watch parties, karaoke, concerts, rhythm content, and story scenes.
[ PROPOSAL (Udon API surface; read-only, local) ]
・float GetLocalAudioOutputLatencySeconds()
Returns the estimated end-to-end local output latency in seconds (local only).
・(optional) double GetLocalDSPTime()
Exposes local audio DSP time for precise scheduling.
・(optional) bool TryGetLocalAudioOutputLatency(out float seconds)
Safe getter with failure fallback.
・(optional) Low-frequency refresh or change notification (approx. 1–2 Hz) if latency can vary at runtime.
[ ADDITIONALLY HELPFUL (not required) ]
・int GetOutputSampleRate()
・void GetDSPBuffer(out int bufferSizeSamples, out int numBuffers)
[ HOW CREATORS WOULD USE IT ]
- Read GetLocalAudioOutputLatencySeconds() on each client.
- When starting or resyncing video, apply that value as a local offset (small seek or short pause) so visual frames align to what the user actually hears.
- Keep existing network sync as is; this request enables only a local correction.
[ KEY USE CASES ]
・Karaoke / lip-sync shows: align mouth movements and lyrics to the user’s real heard audio.
・Watch parties / cinemas: reduce clap-test drift between audio and video.
・Rhythm / DJ / club worlds: visuals land on beat for each attendee.
・Cutscenes / story events: tighter A/V cohesion improves immersion.
[ EXPECTED BEHAVIOR AND CONSTRAINTS ]
・Read-only numeric data; no privacy issues.
・Local-only value; creators implement their own offset logic.
・Reasonable update cadence (on change or around 1–2 Hz) is sufficient.
・If unsupported on a platform, return 0 or a defined “unsupported” state.
[ WHY THIS MATTERS ]
・A single float (local audio output latency) lets creators deliver professional-grade A/V sync.
・It removes device-dependent guesswork and yields a consistent viewing/listening experience with a minimal, safe API.
[ ACCEPTANCE CRITERIA ]
・Udon can query a local audio output latency value (seconds).
・On typical hardware, creators can keep A/V sync within approximately ±20–40 ms using this value.
・Clear, documented fallback behavior when the value is unavailable.
Log In
しんーーご
Instead of exposing this via Udon, adding built-in video–audio sync to the Video Player would also be great.
しんーーご
[Request to vote] Please vote by pressing the number “‹” button next to the title.
[Explanation in Japanese] The following is a supplementary explanation in Japanese (the contents of the original English text are summarized and translated for Japanese people).
[SUMMARY] (summary)
・Make it possible to obtain the local audio output delay (time from when the sound is scheduled by the engine until it actually sounds on the user's speaker/headphone) from Udon.
・If this value is present, an offset can be applied to video playback for each client, and audio and video misalignment due to device differences can be synchronized.
[Problem] (problem)
・Audio output delays vary greatly from user to user (hardware, OS/driver, platform, buffer settings, etc.).
・Currently, since that delay cannot be obtained from Udon, we have to rely on unstable methods such as manual sliders and regular seek/pause.
・As a result, noticeable sound misalignment occurs during auditions, karaoke, concerts, rhythm content, cutscenes, etc.
[PROPOSAL (Udon API surface; read-only, local)] (proposal: read-only, local API)
・float getLocalAudioOutputLatencySeconds ()
Returns the end-to-end output delay in seconds for the local environment (local only).
・ (optional) double getLocalDspTime ()
Local Audio DSP times are published for accurate scheduling.
・ (optional) bool trygetLocalAudioOutputLatency (out float seconds)
A safe retrieval function with a fallback on failure.
・ (optional) Infrequent updates or change notifications (approximately 1-2 Hz)
An update method when delays may fluctuate during runtime.
[(not required)] (information that would be helpful to have, but is not essential)
・int getOutputSampleRate ()
・void getDSPBuffer (out int BufferSizeSamples, out int numBuffers)
[HOW CREATORS WANT TO USE IT] (How creators use it)
• Read getLocalAudioOutputLatencySeconds () on each client.
・When the video is started/resynced, the value is applied as a local offset (short seek, pause, etc.), and the video is adjusted to the actual “sound” the user is “listening.”
・Existing network synchronization will be used as is, and this request is only the provision of information necessary for local correction.
[KEY USE CASES] (main use cases)
・Karaoke/lip-sync: The lip sync and lyrics display can be adjusted to the sound the user actually hears.
・Viewing party/cinema: Reduces the gap between audio and video during applause tests, etc.
・Rhythm/DJ/Club System: Each participant's screen makes it easier to hit the ground.
・Cut Scene/Story: A/V sense of unity has increased and the sense of immersion has improved.
[EXPECTED BEHAVIOR AND CONSTRAINTS] (Expected Behavior and Constraints)
・The numbers are read-only, and there are no privacy issues.
・The values are locally limited, and the creator will implement the offset.
・The update frequency is sufficient when changing or about 1-2 Hz.
・Please define clear behavior, such as returning 0 or an “unsupported” state on unsupported platforms.
[WHY THIS MATTERS] (Why it matters)
・Only a single value called “local audio output delay” can be obtained, and professional-quality A/V synchronization can be achieved.
・Device-dependent guessing work is eliminated, and a consistent viewing experience can be provided with a minimal and secure API.
[ACCEPTANCE STANDARDS] (Acceptance Criteria)
・Local audio output delay (seconds) can be obtained from Udon.
・It is a typical hardware with accuracy and stability that can maintain A/V synchronization at approximately ± 20 to 40 ms.
・The fallback operation when a value cannot be obtained is clearly documented.