As mentioned, if you proceed with lesson 2, you should come across a way to do it.
You can also see sample snippet below
classes = [....your labels] e.g. ['benign', 'malignant']
# weights refers to path to your saved weights (relative to current dir)
# data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
data = ImageDataBunch.single_from_classes('weights', classes, ds_tfms=None, size=224).normalize(imagenet_stats)
# arch_model refers to the architecture you're using e.g. resnet34
learn = cnn_learner(data, arch_model)
# print (learn.summary)
defaults.device = torch.device('cpu')
pred_class, pred_idx, outputs = learn.predict(img)
They’re asking how to create their own dataset from images on Google search. That notebook doesn’t help them.
Even though I call the duckduckgo search function, I get URLs from Bing.
I get an error by running a cell in the lesson 2 download notebook.
# If you already cleaned your data, run this cell instead of the one before
data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='cleaned.csv',
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
The error message-
FileNotFoundError Traceback (most recent call last)
<ipython-input-39-e261b78269a1> in <module>()
3 data = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, csv_labels='dances.csv',
----> 4 ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
/usr/local/lib/python3.6/dist-packages/PIL/Image.py in open(fp, mode)
2808 if filename:
-> 2809 fp = builtins.open(filename, "rb")
2810 exclusive_fp = True
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/Datasets/./https://tse2.mm.bing.net/th?id=OIP.mpwt_AM6zmbNf9j7BRVlQgHaE7&pid=Api'
i think duckduckgo actually uses bing for their crawling. that’s expected.
DataBunch.from_csv wants filenames and labels and the files need to already be on disc. You’ve given it URLs and labels. I don’t think there’s anything in fastai which will deal with that.
You’d have to write a function which downloaded the files from dances.csv and then created a csv for fastai with the filenames and labels. It’s unecessary work for you. The scraper notebook has the option for creating those for when you want to to create massive datasets with thousands of images. In that case distributing the URLs might be preferable, and you could also provide a function they can cut & paste into a notebook and run.
I suggest that you:
ignore the bottom of the scraper notebook, don’t use a csv
zip the images up inside the scraper
bounce the zip onto google drive and then to your lesson 2 notebook
data = ImageDataBunch.from_folder("images", train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)