To my knowledge, the Oculus Lipsync SDK provides each talking viseme with its own float parameter so that one syllable could seamlessly flow into the next one. So I don't see why we can't have that system available to us as avatar parameters, correct me if I'm wrong though.
I basically want to replicate the viseme system with an avatar animator so that I can then modify it in-game (particularly useful for face-tracking). However, the closest I've gotten with the animator parameters (voice float to drive the blendshape amount with a transition time between syllables) didn't reach the same fidelity as the default system. It ended up suffering a transition delay if I spoke too fast or looked stuttery when I spoke slowly.