Sounds You’ve Never Heard Before: Nvidia Unveils AI Model for Audio Generation

gettyimages 2168167041 8e69b46462b943759b56019e4fb27455

Nvidia has introduced an AI model designed to create music and audio, capable of altering voices and generating entirely new sounds.

The technology, called Fugatto, is aimed at creators in music, film, and gaming industries.

The neural network can produce sound effects and music based on text prompts. For instance, it can generate an audio clip described as “a trumpet barking like a dog” or “deep, rumbling bass pulses combined with periodic high-frequency digital chirps—like the sound of a massive intelligent machine awakening.”

What sets Nvidia’s solution apart is its ability to analyze and transform existing audio. For example, it can convert a piano melody into something that sounds like human singing.

“When you think about synthetic audio over the last 50 years, music has evolved thanks to computers and synthesizers. I believe generative AI will bring new possibilities to music, video games, and even everyday people who want to create something new,” said Bryan Catanzaro, Nvidia’s Vice President of Applied Deep Learning Research.

The new model was trained on datasets sourced from the public domain, and the company is exploring options for releasing it to the public.

“Every generative technology carries risks because people might use it to create things we wouldn’t want them to,” Catanzaro cautioned.

Reminder: Google DeepMind recently announced its development of AI-powered technology for generating soundtracks for videos.