I’m still not familiar about how to import data using Fastai. I have folder where is all of my images. Then every image have id which is then linked to class in csv. Actually there is multiple classes in csv seperated with space. How I can import that data into my model? Could someone put example to better understand.
And also have someone made Medium or other articles about this. I think after seeing some examples it is easier for me to understand it better.
Has anyone used the command data.use_partial_data where data is an ItemList? If so, would you please share your example?
Is use_partial_data a way of only using a subset of the images you have collected while training?
I got the same when trying to access the Zeit documentation. @rachel can you please confirm if the site is going to be down for quite a while or is this just a temporary issue? Thanks!
Edit: course-v3.fast.ai seems to be back up now. Sorry for the trouble
During the preparation of a databunch, is there a reason that the fastai library splits data into the train & valid sets before connecting images with class labels?
Wouldn’t it make more sense to put a similar percentage of each class of image into the train and valid datasets?
Has anyone posted a blog where they deal with the problem that could arise if you have one class with many fewer images than other classes? I think Jeremy mentioned that you could make more copies of the images that had low representation. I understand that if each copy is transformed differently that would reduce overfitting.
I wonder what the limits are? E.g. if you have 100 images for most classes and only 10 for another, can neural nets do a good job of identifying new images that match the lowly populated class? This seems like a fun experiment! Please share if you have blogged on this topic.
Making additional copies and then using augementation works, but is pretty cludgy.
@radek has a way to do this just by manipulating the train.csv in an intelligent manner (the ‘difficult’ thing here for me at least was constructing the validation set without including ‘copied but transformed’ images) in his Humpback Whale kaggle github repo.
Has anyone looked at solutions to image classifications where the submitted image is totally off-domain? Classifications will always classify - with some level of confidence (probably not the correct statistical term to apply to the accuracy) but the image might not even be in the right domain. Would a binary classifier of all your classes v general images be a good first step? The following is an example where I’d like to respond with “This doesn’t look like a sports picture - but…”
Have you tried grabbing a sample of non-sports images from, say, imagenet and adding them as another class (not_sports or whatever) to your classifier?
“Other” bucketing like this usually works relatively well with more traditional ML techniques…
Thanks @larcat, I was considering that - but with the nature of the pictures and not always having a specific subject I’d wondered if a binary all sports v others might be a better way. I’ll try both approaches.