Lesson 1 In-Class Discussion ✅

@lesscomfortable - Hi Francisco. Jeremy mentioned on Mon. that you’re going to put together a guide on downloading data from Google Images. I haven’t been able to find that - did I miss it or have you not finished it yet. Don’t mean to pressure you! thanks

1 Like

Hey @ricknta! Haven’t finished yet, will be ready today or tomorrow before midday. Will post it in this thread’s header! :sunglasses:


Look at the post at the very top, there’s a link at the bottom with some description of how to download data off Google. I personally used this repository https://github.com/hardikvasa/google-images-download.

1 Like

Thanks @dreambeats I did see that but wanted to make sure I wasn’t overlooking Francisco’s guide.

1 Like

Hi folks, I’ve created a small (approx 50 images per class) dataset of galaxies according to their high-level morphology (spiral, barred, elliptical, irregular). However the best I can do with them using the approach we learned in lesson 1 is about a 35% error rate with resnet50.

Doing the same for bears (grizzlies vs polars) gets me 0% on resnet34 after 3 cycles!

Is the difference down to the existing learned behaviour in the model? Would a larger dataset improve matters?

FWIW, I know there’s much prior art for classifying galaxies with ML that I’m yet to understand, including a Kaggle Competition and a great writeup from the winner and I’m looking forward to revisiting the problem properly once we learn multi-label classification later in the course.

Just keen to understand for now why transfer learning from resnet using a simple training set for one category of object has such different results from another.

Edit: Here’s my worked notebook.


Can anyone help me to use custom data for classification like Jeremy has used Url constant to load image data. what if we want to download some other dataset from web and use it for classfication. how could we do that?

I found @svenski’s duckgoose utils for fast.ai fantastic for downloading images. Really well written and, assuming you can install chromedriver and use pip, easy to use. Thanks Sergiusz!


Please share your notebooks so we can help you resolve this. The ‘gist it’ extension is perhaps the easiest way.

1 Like

When I use data = ImageDataBunch.froder(path) does it create validation and test set or should it be created before I use the method?

I would be interested as well.

Heh, having trouble installing gist-it, but will tackle that on GCP install forum.

In the meantime, here’s my notebook for classifying galaxies vs bears.

Repository is up! Please give feedback on problems you face.



1 Like

I’m using GCP and lesson 1’s resnet-50 code’s batch size caused an “out of memory” error.
No problem, I changed the “bs” value, restarted, reran the necessary cells, but then got a new error:

ValueError: Expected more than 1 value per channel when training, got input size [1, 4096]

Actually, I’ve also had this problem with v0.7. My guess is that the total number of data samples modulo the batch size ends up with 1 remaining item to process at the end. OK, I’ll change my batch size again, but why can’t the library code be smarter about not ending up with 1 at the end?

1 Like

I faced same issue when I tried to run the cell after changing the num_workers=0 in ImageDataBunch. I restarted the kernel then it worked fine

Thanks for question and answer, I am getting the same error, first I suspected it is due to batch size. I guess it should not stop and turn red, may be green.

I think we can use fastai v1 in google Colab, see this tutorial for details. I think the only problem in the google Colab platform is the allocated resources, which is mentioned in @stas comments.

1 Like

I had some issue with pytorch not using GPU due to older version of driver. I think I updated to .410 (not sure about this but it is the latest from Nvidia), and it started using GPU.

While running my notebook, I run into the following error:

RuntimeError Traceback (most recent call last)
----> 1 learn.fit_one_cycle(5)

RuntimeError: CUDA error: out of memory`

I am using n1-highmem-8 in GCP .
I have tried restarting the notebook kernel, and also restarted my gcp instance, still the problem exist.

Hey, I’ve fastai version : '1.0.12', Pytorch version : '1.0.0.dev20181024'.

Running command
torch.backends.cudnn.enabled gives

nvcc -V gives
release 9.0, V9.0.176

torch.version.cuda gives

But but torch.cuda.is_available() gives False.

Is there any way to solve this?