Google introduces a new V2A (video-to-audio) AI Model

Add realistic sound effects, dialogs, and soundtracks to your AI-generated videos.

Jun 19, 2024

∙ Paid

Over the past few weeks, we’ve seen a wave of text-to-video and image-to-video tools like Google Veo, Kuaishou’s Kling, Luma Lab’s Dream Machine, and the newly announced Runway Gen-3 Alpha.

These AI video tools generate impressive results, but they share a common limitation — they are all silent.

No dialog, no soundtrack, and no sound effects.

Today, Google shared an update about an internal technology they are developing that can generate audio from video input.

What is Google V2A?

Google’s video-to-audio (V2A) combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.

V2A not only creates realistic sound effects and dialogue that match the characters and tone of a video, but it can also generate soundtracks for various traditional footage, including archival material, silent films, and more.

Examples

Keep reading with a 7-day free trial

Subscribe to Generative AI Publication to keep reading this post and get 7 days of free access to the full post archives.