General course chat

Hello,

I’m getting a 404 error when I access https://course-v3.fast.ai/start_gcp.html. I am migrating from AWS, and want to try GCP.

When I tried the url https://course-v3.fast.ai/ I get the message The 3rd edition of course.fast.ai - coming in 2019.

How can I access the instructions to setup and maintain the GCP environment?

thanks.

Has anyone used the command data.use_partial_data where data is an ItemList? If so, would you please share your example?
Is use_partial_data a way of only using a subset of the images you have collected while training?

1 Like

I got the same when trying to access the Zeit documentation. @rachel can you please confirm if the site is going to be down for quite a while or is this just a temporary issue? Thanks!

Edit: course-v3.fast.ai seems to be back up now. Sorry for the trouble

During the preparation of a databunch, is there a reason that the fastai library splits data into the train & valid sets before connecting images with class labels?

Wouldn’t it make more sense to put a similar percentage of each class of image into the train and valid datasets?

Has anyone posted a blog where they deal with the problem that could arise if you have one class with many fewer images than other classes? I think Jeremy mentioned that you could make more copies of the images that had low representation. I understand that if each copy is transformed differently that would reduce overfitting.

I wonder what the limits are? E.g. if you have 100 images for most classes and only 10 for another, can neural nets do a good job of identifying new images that match the lowly populated class? This seems like a fun experiment! :smile: Please share if you have blogged on this topic.


AttributeError Traceback (most recent call last)
in ()
----> 1 arch = models.densenet169

AttributeError: module ‘fastai.vision.models’ has no attribute ‘densenet169’

How can I use densenet with fastai for transfer learning

1 Like

Making additional copies and then using augementation works, but is pretty cludgy.

@radek has a way to do this just by manipulating the train.csv in an intelligent manner (the ‘difficult’ thing here for me at least was constructing the validation set without including ‘copied but transformed’ images) in his Humpback Whale kaggle github repo.

2 Likes

Has anyone looked at solutions to image classifications where the submitted image is totally off-domain? Classifications will always classify - with some level of confidence (probably not the correct statistical term to apply to the accuracy) but the image might not even be in the right domain. Would a binary classifier of all your classes v general images be a good first step? The following is an example where I’d like to respond with “This doesn’t look like a sports picture - but…”

Have you tried grabbing a sample of non-sports images from, say, imagenet and adding them as another class (not_sports or whatever) to your classifier?

“Other” bucketing like this usually works relatively well with more traditional ML techniques…

1 Like

Thanks @larcat, I was considering that - but with the nature of the pictures and not always having a specific subject I’d wondered if a binary all sports v others might be a better way. I’ll try both approaches.

Please post results! I’m super curious about this problem generally as applies to image classification and I don’t think there’s a pat answer.

Will do - I already have the title for my Medium article. If it looks like a duck, and quacks like a duck - it’s Rugby!

1 Like

Hi,

Yes I have used .use_partial_data and it works perfectly.

Nope! It allows you to used a subset of data that you may have in folder before training. It results extremely useful when you want to check if your code works.

In addition, by default it grabs a random percentage of data. Though I don’t know if it can be customised.

3 Likes

I tried your approach last night @larcat and am seeing good results. I used a subset of the CallTech101 images - from all classes and added as my ‘not a sport’ class. https://sportsidentifier.azurewebsites.net/ Generally it is much better but still some odd pictures are chosen as ‘Cricket’. It got my hummingbirds right, but I liked the 2nd choice on this one - truly a field of dreams! I’ll try the other option when I get a chance - all sports v CallTech101 as a binary - so using two stages for the prediction.

Awesome, thanks for the update. I see how that could look like a baseball field if I squint.

What’s your strategy for the two model method? Determine right at 0.5, or…?

1 Like

I’d probably need to see how it worked - and move the decision point around. One thing I’d noticed in using different models as I train is that even when accuracy is improving it can mean that the model is even more sure on incorrect classifications. Most of my ‘not a sport’ pictures were in the 80s and 90s. The one above stuck out as being quite low - but this was in the 10 class classifier. With the binary it will be interesting to see how that image is treated. The CalTech images are quite broad - but probably lacking landscapes - and I might need to leave out the soccer ball class.

You should import fastai before.
But as @seb0 said, if you installed in the last month or later, you should not be confused with previous 0.4~ version, sou you need only import fastai; fastai.__version__

just wanted to draw peoples attention to below LOCs, which in notebooks will give you a graphical explanation of the objects involved.

##! pip install objgraph xdot
##! sudo apt-get install graphviz
import objgraph
##learn = language_model_learner(data_lm, bptt=70, drop_mult=0.3, pretrained_model=URLs.WT103) ## --> example
##objgraph.show_refs([learn])

Having that from @stas in this thread: IPyExperiments: Getting the most out of your GPU RAM in jupyter notebook

but thought these lines are of general value to just understand whats going on in notebooks :wink:

4 Likes

I think the Caltech set are too many groups of very similar images to be a good training set for my binary split detector. The training went almost immediately to 100% - and I was sure I’d done something wrong - but it was working perfectly (on the training data). Once I gave it some real world non sports pictures then it was around 85% correct (but mostly at 99-100% certain of the prediction). It looks like ‘overfitting’ to my training data of non-sports pictures. I’ll see if I can create my own training set that gives better results, and also repeat the previous tests. And my tulip pictures was still a sport.

What was the sports/non-sports split in the training data out of curiosity?

-L

This is for last year’s version of Part 2.