Visual Speech Synthesis By Learning Joint Probabilistic Models Of Audio And Video