So we have simple native eye tracking using the existing configuration on the avatar descriptor, which is great because (when it actually decides to work) all avatars get simple eye tracking.
I propose we do the same for face tracking. A simple set of heuristics could be used to blend between the different existing viseme shapes based on the incoming face tracking OSC parameter values. At the very least viseme AA should be driven by jaw openness, for example.
And of course I realize VRC's face tracking implementation is too slow to replace visemes, but (since you're working natively in C# and not in animators like we're stuck with) it shouldn't be difficult to blend the existing audio-based viseme system on top.