Nick Collins

Home [Music] Research {Software} [Teaching] Contact

 

To accompany the paper:

(2025) [PDF] [sound examples and code] "Unstable Audio: Code Bending Text-to-Music Generation". AES International Conference on Machine Learning and Artificial Intelligence for Audio, London, September 8-10

Associated github project with code for hacks and for the figures in the paper [UnstableAudio]

Sound Examples

Original text prompts and outputs


baseline generation for perturbations below:
'George Clinton and Kraftwerk are stuck in an elevator with only a sequencer to keep them company'

Alternative prompt example renders:
'Detroit techno circa 1988'
'Derrick May electronic dance music'
'early techno music'
'late 1980s Detroit techno'

Examples of perturbing weights at particular layers:


Following the paper, which0 = 0.03 which1 = 0.3 which2 = 3

Layer 3, all 3 pertubation sizes



Layer 240, all 3 pertubation sizes



Layer 708, all 3 pertubation sizes



Examples of diverse processing:


voices appear even though the model is trained on instrumentals:

ambient glitch:

scrunchy squelchy semi-vocal spasm:

Isolated hits with silence inbetween:

In a reverberant station setting/public service announcement, with squeals:

sustained industrial noises: