I just finished watching Lesson 1 video. In the end, @jeremy says to work with our own images. But I have no idea how to get started with this.
He encourages students to work with their own datasets.
He also mentions someone named Fransisco is working on a guide to show how to download images from Google Images to form one’s own dataset. I cannot find that guide.
Could someone please point out to this guide and/or any other guide available to work with one’s own images using the fastai library in the provided notebook?
Continue with the course. He shows how to make a custom dataset in course 2 iirc. You can also check out Adrian Rosebrock @ pyimagesearch. Jeremy’s method was inspired by him.
As mentioned, if you proceed with lesson 2, you should come across a way to do it.
You can also see sample snippet below
classes = [....your labels] e.g. ['benign', 'malignant']
# weights refers to path to your saved weights (relative to current dir)
# data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=None, size=224).normalize(imagenet_stats)
# arch_model refers to the architecture you're using e.g. resnet34
learn = cnn_learner(data, arch_model)
# print (learn.summary)
defaults.device = torch.device('cpu')
pred_class, pred_idx, outputs = learn.predict(img)
They’re asking how to create their own dataset from images on Google search. That notebook doesn’t help them.
Also for the benefit off anyone coming across this any time soon, the JavaScript from the lesson notebook doesn’t work at the moment since Google changed their underlying page structure. I’m sure that’s something which will be fixed in v4 when that comes out in a week or so.
You can use my notebook for now or if you search around you’ll find other people have fixed the JavaScript in various ways.
Even though I call the duckduckgo search function, I get URLs from Bing.
I get an error by running a cell in the lesson 2 download notebook.
# If you already cleaned your data, run this cell instead of the one before
np.random.seed(42)
data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
The error message-
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-39-e261b78269a1> in <module>()
2 np.random.seed(42)
3 data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='dances.csv',
----> 4 ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
10 frames
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
2807
2808 if filename:
-> 2809 fp = builtins.open(filename, "rb")
2810 exclusive_fp = True
2811
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/Datasets/./https://tse2.mm.bing.net/th?id=OIP.mpwt_AM6zmbNf9j7BRVlQgHaE7&pid=Api'
i think duckduckgo actually uses bing for their crawling. that’s expected.
DataBunch.from_csv wants filenames and labels and the files need to already be on disc. You’ve given it URLs and labels. I don’t think there’s anything in fastai which will deal with that.
You’d have to write a function which downloaded the files from dances.csv and then created a csv for fastai with the filenames and labels. It’s unecessary work for you. The scraper notebook has the option for creating those for when you want to to create massive datasets with thousands of images. In that case distributing the URLs might be preferable, and you could also provide a function they can cut & paste into a notebook and run.
I suggest that you:
ignore the bottom of the scraper notebook, don’t use a csv
zip the images up inside the scraper
bounce the zip onto google drive and then to your lesson 2 notebook
!unzip dances.zip
then:
data = ImageDataBunch.from_folder("images", train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)