Missing data for Planet dataset?

(Nate) #1

When running the lesson2-image_models.ipynb notebook, on this step

list_paths = [f"{PATH}train-jpg/train_0.jpg", f"{PATH}train-jpg/train_1.jpg"]
titles=[“haze primary”, “agriculture clear primary water”]
plots_from_files(list_paths, titles=titles, maintitle=“Multi-label classification”)

I get an error:

FileNotFoundError Traceback (most recent call last)
1 list_paths = [f"{PATH}train-jpg/train_0.jpg", f"{PATH}train-jpg/train_1.jpg"]
2 titles=[“haze primary”, “agriculture clear primary water”]
----> 3 plots_from_files(list_paths, titles=titles, maintitle=“Multi-label classification”)

~/fastai/courses/dl1/fastai/plots.py in plots_from_files(imspaths, figsize, rows, titles, maintitle)
36 sp.axis(‘Off’)
37 if titles is not None: sp.set_title(titles[i], fontsize=16)
—> 38 img = plt.imread(imspaths[i])
39 plt.imshow(img)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/matplotlib/pyplot.py in imread(fname, format)
2147 @docstring.copy_dedent(matplotlib.image.imread)
2148 def imread(fname, format=None):
-> 2149 return matplotlib.image.imread(fname, format)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/matplotlib/image.py in imread(fname, format)
1357 'with Pillow installed matplotlib can handle ’
1358 ‘more images’ % list(handlers))
-> 1359 with Image.open(fname) as image:
1360 return pil_to_array(image)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/PIL/Image.py in open(fp, mode)
2608 if filename:
-> 2609 fp = builtins.open(filename, “rb”)
2610 exclusive_fp = True

FileNotFoundError: [Errno 2] No such file or directory: ‘data/planet/train-jpg/train_0.jpg’

I’m wondering if I missed a step where I need to get the data before starting, but I can’t find it in the video or notes. There’s an earlier step for setting up the data on Crestle, but I’m using Paperspace.


Yes you have to download it from Kaggle.

You have several options for that. One Jeremy alluded to was to get the chrome add-on CurlWget. You then go the kaggle planet competition, you accept the terms and conditions of the competition, and download the relevant files (you only need the train_v2.csv file, as well as the jpg ; don’t download the tif files). You can now interrupt the download (you don’t want it on your local computer but on your paperspace machine). Finally you click on the little yellow CurlWget icon, and you copy paste the link it gives you into your paperspace command line (in the data folder). The download should now start.

Make sure that the path to the planet data in your paperspace machine is the same than in the notebook, that you decompressed it, and it should now work :slight_smile:

Another way to get the files is from the Kaggle API, but you need to have the API token on your paperspace machine and it’s a bit more complicated. See here if you’re interested.

(Nate) #3

Thanks, very helpful! It is working now.

For anyone else coming across this, something like “wget https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/download/train-jpg.tar.7z” won’t work. CurlWget constructs a much more elaborate command that is evidently necessary to download the file properly.