Google Research introduces MusicLM, a model that can generate high-fidelity music from text descriptions.
See how MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and how it outperforms previous systems in audio quality and text description adherence.
Learn more about MusicCaps, a dataset composed of 5.5k music-text pairs, and see how MusicLM can be conditioned on both text and a melody.
Check out this video to see the power of MusicLM.