From a single photo, and a short sample of audio, very realistic video can be generated and streamed in real time.
https://www.microsoft.com/en-us/rese...roject/vasa-1/
“… Our method is capable of not only producing precious lip-audio synchronization, but also generating a large spectrum of expressive facial nuances and natural head motions. It can handle arbitary-length audio and stably output seamless talking face videos...”
Looks like for now Microsoft does not plan to release this, they are keeping it for themselves.