Tips for building large image datasets

william · October 29, 2018, 5:40pm

I built a dataset curator to help find and remove both duplicate images and images from outside of the data distribution. It uses the intermediate representations from a pretrained vgg network (similar to content loss when doing style transfer).