Google's Vlogger AI creates Avatars

A new AI model that will be able to create avatars from still images. A new AI concept with a different approach to changing content creation and video-making process by creating AI generated avatars.

Google’s researchers have been working overtime recently, publishing a flurry of new models and ideas. The latest is a way to take a still image and turn it into a controllable avatar.

It’s an AI model able to create an animated avatar from a still image and maintain the photorealistic look of the person in the photo in every frame of the final video.

VLOGGER AI is a multi-modal Diffusion model suitable for virtual portraits. It is trained using the MENTOR database, which contains more than 800,000 portraits and more than 2,200 hours of videos. This allows VLOGGER to generate images of different races and ages. It can also generate portrait videos in different clothes and postures.

The model then also takes in an audio file of the person speaking and handles body and lip movement to reflect the natural way that person might move if it were them saying the words.

This includes creating head motion, facial expression, eye gaze, blinking as well as hand gestures and upper body movement without any reference beyond the image and audio.

The team of researchers at Google has developed the new VLOGGER AI system, which is a tool that helps create realistic talking videos using only a selfie and users’ voices. It is catered to those who may struggle to speak in front of the camera but are looking to generate content.

The researchers, led by Enric Corona at Google Research, leveraged a type of machine learning model called the diffusion model to achieve the novel result.

Diffusion models have recently shown remarkable performance at generating highly realistic images from text descriptions. By extending them into the video domain and training on a vast new dataset, the team was able to create an AI system that can bring photos to life in a highly convincing way.

This innovative model is built on the diffusion architecture, known for its process in text-to-image, video, and 3D modelling. By incorporating additional control mechanisms, VLOGGER takes the concept of avatar creation to new heights.

Currently, VLOGGER is nothing more than a research project with a couple of fun demo videos, but if it is ever turned into a product it could be a new way to communicate in Teams or Slack.

