Google’s Soundstorm Clones Voices with 3 Seconds of Audio!

SoundStorm showcases the incredible ability of AI to clone voices and generate natural-sounding audio with exceptional fidelity.

By leveraging the hierarchical token structure of audio produced by neural codecs, SoundStorm achieves two orders of magnitude faster audio generation compared to traditional methods.


With SoundStorm, we can now synthesize long-form audio sequences, including natural dialogues with multiple speaker turns, while maintaining the consistency, quality, and speaker identities.


This breakthrough opens up endless possibilities for speech synthesis, voice cloning, and audio generation applications. 

MattVidPro AI
190K subscribers