Image Dataset on professional photographs

I want to categorize an image as a headshot,3/4 and categories like that ,I don’t know more categories would like to know them.So the model will basically tell us if our pictures are closer to pro style and which category it is.I didn’t find datasets like that.Can you guys help.
Thank you.

Check out this pretty popular model called flickr-style. It’s a pre-trained caffe model that does exactly what you’re looking for.
Here’s a link. http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

And also, the paper it’s based on : http://sergeykarayev.com/files/1311.3715v3.pdf

But you might run into copyright issues because of the fact that this dataset is from flickr.

1 Like

Thank you.So the datasets from flicker like sites will have copyright issues ,is there any open source ones.

Even we are just using the images to train a model, it still got license issue?

I have talked to lawyers on this topic. Usually, the legal recommendation is - as long as an image is not going to get reproduced another place, then using the image for training purposes is okey. Similarly, search engines can index and cache your sites contents, but for images, cannot show more than a low-resolution image thumbnail. A copyrighted image owner would expectedly not want her/his image shown on another website. But using signals from several images in aggregate is fine.

Always safe to pick up Creative Common images (MS COCO images are creative commons ).

For gathering datasets, there are a few APIs like Bing Image Search API and set creative commons which can find several thousand images on a topic quickly.

3 Likes

Thanks for your helps.
Could I interpret your words as following?

1 : It is safe to use the images to train your model even the images are not Creative Common images if you do not plan on distribute those images
2 : If you want to distribute the images you collected, better make sure they are Creative Common, else you may have some troubles

What is this mean?

"But using signals from several images in aggregate is fine."
In short, CNN models are learning information from several photos. So they are aggregating signal from a large number of photos, without an individual photo contributing majorly to the learning.

“1 : It is safe to use the images to train your model even the images are not Creative Common images if you do not plan on distribute those images” .Overall, this should be fine.
“2 : If you want to distribute the images you collected, better make sure they are Creative Common, else you may have some troubles”. For distribution, while this should be fine, better to share link to the urls where the images were originally hosted. There are several types of Creative Common photos, like use with attribution. Follow the COCO as an example on how to best share a dataset of CC photos.

1 Like