Image Dataset for Finetuning

cjmills · December 11, 2022, 2:02am

Hi Everyone,

I’ve been working with images from the free stock photos website Pexels in my experiments finetuning Stable Diffusion.

Pexels Home Page

I have a large dataset of images and image attributes (e.g., descriptions, tags, etc.) and thought I would share it with the community. Specifically, the dataset contains ~~110,715~~ 109,971 images (had to remove a few panoramic images). Each image has a title/prompt at minimum.

Kaggle Datasets

Version	Source Images	Depth Maps
512p	Pexels 110k 512p JPEG	Pexels 110k 512p JPEG Depth Maps
768p	Pexels 110k 768p JPEG	Pexels 110k 768p JPEG Depth Maps

Sample Image with Depth Map

Sample Entry


`img_id`	3186010
`title`	Pink and White Ice Cream Neon Signage
`aspect_ratio`	0.749809
`main_color`	[128, 38, 77]
`colors`	[#000000, #a52a2a, #bc8f8f, #c71585, #d02090, #d8bfd8]
`tags`	[bright, chocolate, close-up, cold, cream, creamy, cup, dairy product, delicious, design, dessert, electricity, epicure, flavors, fluorescent, food, food photography, goody, hand, ice cream, icecream, illuminated, indulgence, light pink background, neon, neon lights, neon sign, pastry, pink background, pink wallpaper, scoop, sweet, sweets, tasty]
`adult`	very_unlikely
`aperture`	1.8
`camera`	iPhone X
`focal_length`	4.0
`google_place_id`	ChIJkUjxJ7it1y0R4qOVTbWHlR4
`iso`	40.0
`latitude`	-7.746914
`longitude`	113.226906
`manufacturer`	Apple
`medical`	very_unlikely
`orientation`	0.0
`racy`	unlikely
`shutter_speed`
`software`	13.1.3
`spoof`	very_unlikely
`violence`	very_unlikely
`location`	Kecamatan Mayangan, Jawa Timur, Indonesia

I created two versions of the image dataset. One version has images with a minimum dimension of 512, while the other has a minimum length of 768. Both datasets have maximum sizes that are multiples of 32.

I already uploaded the Pandas DataFrame containing the image attributes to a GitHub Repository. I also included notebooks to explore the dataset and download source files from Pexels. Keep in mind that Pexels limits each IP address to about 500 downloads. You can use a VPN to get around this if needed. Also, the source files are rather large.

GitHub Repository: pexels-dataset
Jupyter Notebook: Explore Image Attributes
Jupyter Notebook: Download Image Subset

I plan to upload the two versions of the dataset to Kaggle as the Pexels license is quite generous regarding usage. However, I was hoping someone here would double-check the license’s wording to verify it’s allowed.

Pexels License

cjmills · December 12, 2022, 8:36am

The 512p version is now available on Kaggle. The 768p version is about a third of the way uploaded.

Kaggle Dataset: Pexels 110k 512p JPEG

cjmills · December 12, 2022, 9:06pm

The 768p version finally finished processing on Kaggle.

Kaggle Dataset: Pexels 110k 768p JPEG

cjmills · December 16, 2022, 11:44pm

I just finished my first attempt at finetuning the depth2img v2.0 model on a thousand images from the pexels dataset. I am currently uploading the depth images to Kaggle. Initial results seem ok, so I’ll test using Dreambooth on the finetuned model to add a new style.

cjmills · December 17, 2022, 12:44am

Here are the depth images for the 512p dataset.

Kaggle Dataset: Pexels 110k 512p JPEG Depth

cjmills · December 17, 2022, 2:53am

Ok, this shows some promise.

cjmills · December 17, 2022, 3:23am

The current depth maps often result in losing far backgrounds, so I’ll need to investigate solutions to address that.

My goal for this project is to generate datasets of stylized images that I can then use to train a real-time style transfer model as an update to my old in-game style transfer tutorial.

vtecftwy · December 17, 2022, 9:04am

Thanks for sharing this

cjmills · December 19, 2022, 7:22pm

I concluded that the initial depth maps I generated suck, and I am currently making higher-quality ones. Hopefully, it will take less than a day to get through the pexels dataset.

cjmills · December 20, 2022, 9:00pm

The new depth maps seem to work better.

Sample image with new depth map

Result with old depth map

Result with new depth map

yalahao · February 9, 2023, 3:47am

Thank you for sharing these datasets. Are the new depth maps updated in the links? Is the difference due to Midas vs BMD?

I am starting to fine-tune SD models as well! How far are you with your project?

cjmills · February 9, 2023, 4:19am

They are newer, but I’m still but 100% satisfied with them.

There is another depth estimation model I came across that performs way better with most images but falls apart when there is too much blur. I have some ideas to remedy that, but have not had time to explore them yet.

I plan to upload my fine-tuning notebooks to github in the next few days

yalahao · February 9, 2023, 5:44am

I want to fine-tune depth2map too. Since I couldn’t find any existing notebook, I was planning to adapt this dreambooth NB into kohya-trainer (for non-square aspect ratio training images).

Looking forward to your notebooks. Happy to be an early tester and help contributing.

cjmills · February 13, 2023, 7:17pm

@yalahao
I still need to make a tutorial post, but you can check out some of the notebooks I use at the links below.

I have not had a chance to try newer fine-tuning methods like LoRA (Low-Rank Adaptation).

I have only used a batch size of 1 due to memory constraints, so I have not needed to do anything special for non-uniform aspect ratios.

Here is a screenshot of the gradio annotation interface.

I forgot to answer your question about how I made the depth images. The current versions on Kaggle still use Midas, just at higher resolutions. I have also tried LeReS and Boosting Monocular Depth (BMD). BMD is neat, but I need to find an image size that provides superior results while not taking an age to get through the dataset. LeRes is the one that is great for many images but falls apart with too much blur.

Also, use the latest CUDA versions included with the newest PyTorch version. Inference with the stable diffusion U-net is twice as fast for me, and training gets a lower (but still significant) speed improvement.

yalahao · February 14, 2023, 6:02pm

Thank you for sharing! I will try them out

BTW, did you see this? Very interesting addition GitHub - lllyasviel/ControlNet: Let us control diffusion models

cjmills · February 14, 2023, 6:11pm

I saw that posted on the Stable Diffusion subreddit, but I have not had a chance to check it out. It does look promising, though.

I am currently cleaning up my existing Jupyter notebooks to turn some of them into blog posts. I accumulated over 70 notebooks during the course and whittled them back down to about 30. I want to finish that before starting new experiments.

cjmills · February 14, 2023, 6:18pm

After that, I would like to see if I could use perceptual loss (i.e., activations from a VGG or other model) with ControlNet. That should be less hit-or-miss than relying on a depth estimation model.

jamesrequa · February 21, 2023, 12:34am

I’ve been using ControlNet the past few days and absolutely love it. I think its such a clever idea for conditioning SD model generations! I’ve been applying it to some of my custom trained SD models and got excellent results with canny edges, HED, and depth options.

btw no training is required to use it!! The base ControlNet trained model is sufficient to guide an already trained SD model.

saris.kiat · July 18, 2024, 3:33pm

hi , if you dont mind could you please share the controlnet code? ;-; ivebeen working on it but i just cant do it