~ diffusion-models, the secret sauce, would work for tunes, too! ~
TL;DR — DALL-E, and the rest of the gorgeous A.I. generated art, works by learning how to “undo mistakes”. They see a slowly-more-noisy-and-corrupted image, and they must guess which corrections to make. Do the same for songs, with actual noise added, until the song is gone — the diffusion model learns to de-noise the song. If you feed it a bunch of random noise as input, it’ll turn that noise, bit by bit, into a song! Can this happen, Gods of Machine Learning, in those hallowed halls of tech companies? I can’t feed an A.I. every song ever written and train a few billion parameters, with hyperparameter tuning!
That’s it. Just… hoping for a diffusion model to turn literal noise into songs.