I have a large dataset of images and image attributes (e.g., descriptions, tags, etc.) and thought I would share it with the community. Specifically, the dataset contains 110,715 109,971 images (had to remove a few panoramic images). Each image has a title/prompt at minimum.
I created two versions of the image dataset. One version has images with a minimum dimension of 512, while the other has a minimum length of 768. Both datasets have maximum sizes that are multiples of 32.
I already uploaded the Pandas DataFrame containing the image attributes to a GitHub Repository. I also included notebooks to explore the dataset and download source files from Pexels. Keep in mind that Pexels limits each IP address to about 500 downloads. You can use a VPN to get around this if needed. Also, the source files are rather large.
I plan to upload the two versions of the dataset to Kaggle as the Pexels license is quite generous regarding usage. However, I was hoping someone here would double-check the license’s wording to verify it’s allowed.
I just finished my first attempt at finetuning the depth2img v2.0 model on a thousand images from the pexels dataset. I am currently uploading the depth images to Kaggle. Initial results seem ok, so I’ll test using Dreambooth on the finetuned model to add a new style.
The current depth maps often result in losing far backgrounds, so I’ll need to investigate solutions to address that.
My goal for this project is to generate datasets of stylized images that I can then use to train a real-time style transfer model as an update to my old in-game style transfer tutorial.
I concluded that the initial depth maps I generated suck, and I am currently making higher-quality ones. Hopefully, it will take less than a day to get through the pexels dataset.
They are newer, but I’m still but 100% satisfied with them.
There is another depth estimation model I came across that performs way better with most images but falls apart when there is too much blur. I have some ideas to remedy that, but have not had time to explore them yet.
I plan to upload my fine-tuning notebooks to github in the next few days
I want to fine-tune depth2map too. Since I couldn’t find any existing notebook, I was planning to adapt this dreambooth NB into kohya-trainer (for non-square aspect ratio training images).
Looking forward to your notebooks. Happy to be an early tester and help contributing.
I forgot to answer your question about how I made the depth images. The current versions on Kaggle still use Midas, just at higher resolutions. I have also tried LeReS and Boosting Monocular Depth (BMD). BMD is neat, but I need to find an image size that provides superior results while not taking an age to get through the dataset. LeRes is the one that is great for many images but falls apart with too much blur.
Also, use the latest CUDA versions included with the newest PyTorch version. Inference with the stable diffusion U-net is twice as fast for me, and training gets a lower (but still significant) speed improvement.
I saw that posted on the Stable Diffusion subreddit, but I have not had a chance to check it out. It does look promising, though.
I am currently cleaning up my existing Jupyter notebooks to turn some of them into blog posts. I accumulated over 70 notebooks during the course and whittled them back down to about 30. I want to finish that before starting new experiments.
After that, I would like to see if I could use perceptual loss (i.e., activations from a VGG or other model) with ControlNet. That should be less hit-or-miss than relying on a depth estimation model.
I’ve been using ControlNet the past few days and absolutely love it. I think its such a clever idea for conditioning SD model generations! I’ve been applying it to some of my custom trained SD models and got excellent results with canny edges, HED, and depth options.
btw no training is required to use it!! The base ControlNet trained model is sufficient to guide an already trained SD model.