Riffusion

Reading 1 min Published by April 25, 2023 Modified by October 8, 2025

Riffusion is a real-time music generation web service based on the Stable Diffusion model. It was created as a hobby project by Seth Forsgren and Hayk Martiros, who fine-tuned the model to generate spectrogram images from text input, which can then be converted into audio clips [1].

The service has a web app available on GitHub [2] and is designed to provide an intuitive, user-friendly interface for generating and sharing music.

Riffusion can be run as a Flask server that provides inference via API, allowing the web app to run locally. The model generates music from text prompts, enabling users to explore various styles, instruments, modifiers, genres, and sounds.

For instance, users can experiment with saxophone, violin, Arabic, Jamaican, jazz, gospel, church bells, or rain sounds, among other combinations.

Riffusion has been featured in articles by MusicRadar and Futurepedia, and it has a growing community on Reddit with an installation guide for Windows users [3]. Additionally, there are YouTube videos showcasing the capabilities of the service.