Lesson 1: How do I get started using my own images for classification?

I just finished watching Lesson 1 video. In the end, @jeremy says to work with our own images. But I have no idea how to get started with this.

He encourages students to work with their own datasets.

He also mentions someone named Fransisco is working on a guide to show how to download images from Google Images to form one’s own dataset. I cannot find that guide.

Could someone please point out to this guide and/or any other guide available to work with one’s own images using the fastai library in the provided notebook?

I am using Google Colab if it is relevant.

Continue with the course. He shows how to make a custom dataset in course 2 iirc. You can also check out Adrian Rosebrock @ pyimagesearch. Jeremy’s method was inspired by him.

1 Like

Lesson 2, you mean?

Okay. I thought he encouraged to do a project just after finishing the first video.

Thanks a lot.

As mentioned, if you proceed with lesson 2, you should come across a way to do it.
You can also see sample snippet below

classes = [....your labels] e.g. ['benign', 'malignant']
# weights refers to path to your saved weights (relative to current dir)
# data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=None, size=224).normalize(imagenet_stats)
# arch_model refers to the architecture you're using e.g. resnet34
learn = cnn_learner(data, arch_model)
# print (learn.summary)
defaults.device = torch.device('cpu')
pred_class, pred_idx, outputs = learn.predict(img)
1 Like

You can use this if you want

You can just hit the open in colab button to fire it up.


You can also refer to this notebook

They’re asking how to create their own dataset from images on Google search. That notebook doesn’t help them.

Also for the benefit off anyone coming across this any time soon, the JavaScript from the lesson notebook doesn’t work at the moment since Google changed their underlying page structure. I’m sure that’s something which will be fixed in v4 when that comes out in a week or so.

You can use my notebook for now or if you search around you’ll find other people have fixed the JavaScript in various ways.

1 Like

Hi, I worked with your notebook. It works fine.

Two issues:

  1. Even though I call the duckduckgo search function, I get URLs from Bing.
  2. I get an error by running a cell in the lesson 2 download notebook.
# If you already cleaned your data, run this cell instead of the one before
data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

The error message-

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-39-e261b78269a1> in <module>()
      2 np.random.seed(42)
      3 data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='dances.csv',
----> 4         ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

10 frames
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
   2808     if filename:
-> 2809         fp = builtins.open(filename, "rb")
   2810         exclusive_fp = True

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/Datasets/./https://tse2.mm.bing.net/th?id=OIP.mpwt_AM6zmbNf9j7BRVlQgHaE7&pid=Api'

Ideas why? And how to get around this?

Never mind on the issue #2.

The notebook is not written in a linear manner, and that was giving me troubles. Figured it out.

Would still like to hear in issue #1.


i think duckduckgo actually uses bing for their crawling. that’s expected.

DataBunch.from_csv wants filenames and labels and the files need to already be on disc. You’ve given it URLs and labels. I don’t think there’s anything in fastai which will deal with that.

You’d have to write a function which downloaded the files from dances.csv and then created a csv for fastai with the filenames and labels. It’s unecessary work for you. The scraper notebook has the option for creating those for when you want to to create massive datasets with thousands of images. In that case distributing the URLs might be preferable, and you could also provide a function they can cut & paste into a notebook and run.

I suggest that you:

  • ignore the bottom of the scraper notebook, don’t use a csv
  • zip the images up inside the scraper
  • bounce the zip onto google drive and then to your lesson 2 notebook
  • !unzip dances.zip

data = ImageDataBunch.from_folder("images", train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

1 Like

That totally works. Thanks.

Haven’t used the path object at all. Although that shouldn’t lead to any additional problems when used.

1 Like

Currently, this handy Firefox extension works on Google and DuckDuckGo.

It also shows the number of image links you download and saves it in a CSV file.