Lesson 1: How do I get started using my own images for classification?

rghosh · July 26, 2020, 2:00pm

I just finished watching Lesson 1 video. In the end, @jeremy says to work with our own images. But I have no idea how to get started with this.

He encourages students to work with their own datasets.

He also mentions someone named Fransisco is working on a guide to show how to download images from Google Images to form one’s own dataset. I cannot find that guide.

Could someone please point out to this guide and/or any other guide available to work with one’s own images using the fastai library in the provided notebook?

I am using Google Colab if it is relevant.

Legnica1241 · July 26, 2020, 2:52pm

Continue with the course. He shows how to make a custom dataset in course 2 iirc. You can also check out Adrian Rosebrock @ pyimagesearch. Jeremy’s method was inspired by him.

rghosh · July 26, 2020, 3:21pm

Lesson 2, you mean?

Okay. I thought he encouraged to do a project just after finishing the first video.

Thanks a lot.

dokuboyejo · July 26, 2020, 4:31pm

As mentioned, if you proceed with lesson 2, you should come across a way to do it.
You can also see sample snippet below

classes = [....your labels] e.g. ['benign', 'malignant']
# weights refers to path to your saved weights (relative to current dir)
# data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=None, size=224).normalize(imagenet_stats)
# arch_model refers to the architecture you're using e.g. resnet34
learn = cnn_learner(data, arch_model)
# print (learn.summary)
defaults.device = torch.device('cpu')
pred_class, pred_idx, outputs = learn.predict(img)

joedockrill · July 26, 2020, 5:15pm

You can use this if you want

You can just hit the open in colab button to fire it up.

dokuboyejo · July 27, 2020, 1:45am

You can also refer to this notebook

github.com

fastai/fastai_dev/blob/master/dev_nb/104c_single_image_pred.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example of single image prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },

This file has been truncated. show original

joedockrill · July 27, 2020, 6:35am

They’re asking how to create their own dataset from images on Google search. That notebook doesn’t help them.

Also for the benefit off anyone coming across this any time soon, the JavaScript from the lesson notebook doesn’t work at the moment since Google changed their underlying page structure. I’m sure that’s something which will be fixed in v4 when that comes out in a week or so.

You can use my notebook for now or if you search around you’ll find other people have fixed the JavaScript in various ways.

rghosh · July 28, 2020, 9:54am

Hi, I worked with your notebook. It works fine.

Two issues:

Even though I call the duckduckgo search function, I get URLs from Bing.
I get an error by running a cell in the lesson 2 download notebook.

# If you already cleaned your data, run this cell instead of the one before
np.random.seed(42)
data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
        ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

The error message-

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-39-e261b78269a1> in <module>()
      2 np.random.seed(42)
      3 data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='dances.csv',
----> 4         ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

10 frames
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
   2807 
   2808     if filename:
-> 2809         fp = builtins.open(filename, "rb")
   2810         exclusive_fp = True
   2811 

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/Datasets/./https://tse2.mm.bing.net/th?id=OIP.mpwt_AM6zmbNf9j7BRVlQgHaE7&pid=Api'

Ideas why? And how to get around this?

rghosh · July 28, 2020, 10:23am

Never mind on the issue #2.

The notebook is not written in a linear manner, and that was giving me troubles. Figured it out.

Would still like to hear in issue #1.

@joedockrill

joedockrill · July 28, 2020, 10:39am

i think duckduckgo actually uses bing for their crawling. that’s expected.

DataBunch.from_csv wants filenames and labels and the files need to already be on disc. You’ve given it URLs and labels. I don’t think there’s anything in fastai which will deal with that.

You’d have to write a function which downloaded the files from dances.csv and then created a csv for fastai with the filenames and labels. It’s unecessary work for you. The scraper notebook has the option for creating those for when you want to to create massive datasets with thousands of images. In that case distributing the URLs might be preferable, and you could also provide a function they can cut & paste into a notebook and run.

I suggest that you:

ignore the bottom of the scraper notebook, don’t use a csv
zip the images up inside the scraper
bounce the zip onto google drive and then to your lesson 2 notebook
!unzip dances.zip

then:
data = ImageDataBunch.from_folder("images", train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

rghosh · July 28, 2020, 12:06pm

That totally works. Thanks.

Haven’t used the path object at all. Although that shouldn’t lead to any additional problems when used.

kaustubhXD · August 8, 2020, 12:55pm

Currently, this handy Firefox extension works on Google and DuckDuckGo.

It also shows the number of image links you download and saves it in a CSV file.