Vlogger, Google's dystopian AI that creeps out

If we have already become accustomed to the option of creating images from text and soon we will be able to use words to do the same with videos, now comes the intermediate option: using an image to create a video. And in this you can hear the protagonist. Is about Vlogger, the latest technology from Google.

Scientists at the search engine giant have developed a new artificial intelligence model that can transform a single still image of a person into a talking and moving avatar. The results are as surprising as they are dystopian.

In a whitepaper, the Google team describes Vlogger as a “novel framework for synthesize humans from audio“, adding that “it is precisely automation and behavioral realism that we seek in this work… a multimodal interface for an embodied conversational agent.” This “agent,” they continue, is ultimately “designed to support natural conversations with a human user.” Basically, a photo that, thanks to AI, moves and speaks. And interact realistically with real human beings on the other end.

In the article, the researchers propose that this model, which requires only one image and one piece of audio, could be used to “improve online communication, education, or personalized virtual assistants.” Vlogger can also edit videos in motion, which the researchers say will “facilitate creative processes.”

However, they fail to mention that a tool that could generate completely artificial, moving and speaking video clips from a single image seems ripe for abuse by… Basically anyone. AI deepfakes, for example, are already a growing problem. But while generating a deepfake is easier than ever due to the public availability of generative AI tools, creating one Convincing generally requires a combination of multiple AI tools. Right now, when using the Vlogger model, users still need to provide the desired audio for the video. Still, Vlogger would probably streamline the overall process.

What's more, according to the document presented, Vlogger “does not require the user to have training to be able to use the technology”. The authors, led by Enric Corona, point out that it “generates the complete picture” and “considers a wide spectrum of scenarios that are critical to correctly synthesize communicating humans.”

In short, that means that Vlogger does not require specific training for each person, anyone can create a fake but realistic video from a single image of almost anyone. Any. Obviously nothing could go wrong…read in an ironic tone.

Vlogger's AI animations are not perfect yet. They still have a distinctly inhuman touch, moving and speaking in a strangely robotic manner. Until now fed with 2,200 hours of video and “800,000 identities”, according to the document, but there is time to improve, whatever that quality means in this case. And when it is much better, it may already be too late to understand the problem.