I have a large dataset of images and image attributes (e.g., descriptions, tags, etc.) and thought I would share it with the community. Specifically, the dataset contains 110,715 109,971 images (had to remove a few panoramic images). Each image has a title/prompt at minimum.
I created two versions of the image dataset. One version has images with a minimum dimension of 512, while the other has a minimum length of 768. Both datasets have maximum sizes that are multiples of 32.
I already uploaded the Pandas DataFrame containing the image attributes to a GitHub Repository. I also included notebooks to explore the dataset and download source files from Pexels. Keep in mind that Pexels limits each IP address to about 500 downloads. You can use a VPN to get around this if needed. Also, the source files are rather large.
I plan to upload the two versions of the dataset to Kaggle as the Pexels license is quite generous regarding usage. However, I was hoping someone here would double-check the license’s wording to verify it’s allowed.
I just finished my first attempt at finetuning the depth2img v2.0 model on a thousand images from the pexels dataset. I am currently uploading the depth images to Kaggle. Initial results seem ok, so I’ll test using Dreambooth on the finetuned model to add a new style.
They are newer, but I’m still but 100% satisfied with them.
There is another depth estimation model I came across that performs way better with most images but falls apart when there is too much blur. I have some ideas to remedy that, but have not had time to explore them yet.
I plan to upload my fine-tuning notebooks to github in the next few days
I forgot to answer your question about how I made the depth images. The current versions on Kaggle still use Midas, just at higher resolutions. I have also tried LeReS and Boosting Monocular Depth (BMD). BMD is neat, but I need to find an image size that provides superior results while not taking an age to get through the dataset. LeRes is the one that is great for many images but falls apart with too much blur.
Also, use the latest CUDA versions included with the newest PyTorch version. Inference with the stable diffusion U-net is twice as fast for me, and training gets a lower (but still significant) speed improvement.
I saw that posted on the Stable Diffusion subreddit, but I have not had a chance to check it out. It does look promising, though.
I am currently cleaning up my existing Jupyter notebooks to turn some of them into blog posts. I accumulated over 70 notebooks during the course and whittled them back down to about 30. I want to finish that before starting new experiments.
I’ve been using ControlNet the past few days and absolutely love it. I think its such a clever idea for conditioning SD model generations! I’ve been applying it to some of my custom trained SD models and got excellent results with canny edges, HED, and depth options.
btw no training is required to use it!! The base ControlNet trained model is sufficient to guide an already trained SD model.