I mentioned the Mubert API for generating music using a text prompt sometimes back. But I didn’t have a chance to play with it till today.
I decided to create a notebook on various resources I looked at to show how to use the basic Mubert API to generate music using a text prompt and also to expand upon that to generate music using an input image by getting a description for the image using CLIP interrogation.
I tried to keep the code as simple as possible and provided links to the original sources and additional information. This code is fully Apple Silicon/MPS compatible and all development/work was done on a MacBook Pro. But be warned that the second part (specifically the CLIP interrogation) took like one and half hours to run on my MacBook Pro
Code and further details available via the Mubert Music notebook in my GitHub repo.
Can you put the !pip statements into your note book in the first cell because it does not work on Colab. I did try to resolve as many but it would be easier you to went Colab (which is free) and you will see all the unresolved references. Once you have done that I will willing try it again. Please remember to set the Colab run time to GPU which should make it much faster.
!pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
!pip install -e git+https://github.com/pharmapsychotic/BLIP.git@main#egg=blipimport clip
!pip3 install httpx
!pip install sentence-transformers
!pip install transformers==4.15.0 timm==0.4.12 fairscale==0.4.4
This is the piece I could not fix
from models.blip import blip_decoder
The models.blip comes from the BLIP package but running it locally I couldn’t get that to work with a pip install either … It just wouldn’t work. I thought it was a macOS issue.
I had to clone the repo, and then do a
pip install . from the repo folder. So you’d have to add something like this to a cell, then run it and see if that works …
!git clone https://github.com/pharmapsychotic/BLIP.git
!pip install .
This isn’t either text-to-music or image-to-music, but I did add another Jupyter notebook showing how to use the new
DanceDifffusionPipeline added to Hugging Face diffusers. That one simply generates music for a given duration/length.
The notebook is in my GitHub repo.