Lesson 3: planet submit to kaggle

Since we got a really nice result for the planet dataset. I decided to submit it to kaggle and see how it performs.
Finally with some trail and error I could do it.

Here’s the notebook.

I did upload it to kaggle and it told me I didn’t upload the total number of images it requires.
I assume that’s because of some image mismatch in the jpg version of the test set.

I hope still, you’ll find something interesting in this notebook.

And tell me if there’s a better way to do this.

5 Likes

You need to add an additional test data set from kaggle and merge it with the existing test folder.

path = …/planet

! kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg-additional.tar.7z -p {path}
! 7za -bd -y x {path}/test-jpg-additional.tar.7z -o{path}
! tar -xf {path}/test-jpg-additional.tar -C {path}
! rsync -aP {path}/test-jpg-additional/ {path}/test-jpg/
4 Likes

Thanks a lot.
I’ll give it a try.

1 Like

I had my notebook working untill fastai library had the latest update. To be specific, I was able to predict on test_ds (61191 images) with
ImageFileList.from_folder(path)
predictions = learn.TTA(ds_type=DatasetType.Test)[0]

After the update, I started using
ImageItemList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg')
and no matter what I do, the model is predicting torch.Size([8095, 17])

Previous and current notebooks are available in github.

PS: I have taken parts of code from @arunoda’s notebook.

3 Likes

This is something new to me.
I got some issues with learn.get_preds().

There was bug when handling test dataset in the 1.0.24 of fastai.
It’s fixed in the current master.

With that I could submit to kaggle and here’s the result.

For the validation set I got 0.931245.

I think this is great.

1 Like

While trying to run predict(0) I’m getting this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-290-a34a1ee83908> in <module>()
----> 1 predict(0)

<ipython-input-263-a89135b0e55b> in predict(idx)
      1 def predict(idx):
      2     pred_vals = predictions[idx]
----> 3     tags = find_tags(pred_vals, 0.2)
      4     print(tags)
      5     img = learn.data.test_ds[idx][0]

<ipython-input-289-941955276bc7> in find_tags(pred, thresh)
      2     classes = ""
      3     for idx, val in enumerate(pred):
----> 4         if val > thresh:
      5             classes = f'{classes} {learn.data.classes[idx]}'
      6     return classes.strip()

RuntimeError: bool value of Tensor with more than one value is ambiguous

Any idea why this happens?

Are you using the latest version of fastai?
If not, try to update it.

$ pip show fastai
Name: fastai
Version: 1.0.26
Summary: fastai makes deep learning with PyTorch faster, more accurate, and easier
Home-page: https://github.com/fastai/fastai
Author: Jeremy Howard
Author-email: info@fast.ai
License: Apache Software License 2.0
Location: /home/nbuser/.anaconda3/lib/python3.7/site-packages
Requires: fastprogress, pandas, cymem, Pillow, regex, matplotlib, thinc, torchvision-nightly, bottleneck, pyyaml, typing, dataclasses, numexpr, spacy, requests, scipy, numpy

try :
predictions = learn.TTA(ds_type=DatasetType.Test)

predictions = learn.TTA(ds_type=DatasetType.Test)[0] seems to work after upgrading to v1.0.27.

Fastai library had some bug in 1.0.24, which they have fixed and it is working fine in 1.0.27 version.

Before Update :
learn.data.test_ds was showing :

LabelList
y: MultiCategoryList (8095 items)
[MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary]…
Path: .
x: ImageItemList (61191 items)
[Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256)]…
Path: /home/jupyter/.fastai/data/planet

Updating the library worked for me.

pip show fastai

sudo /opt/anaconda3/bin/conda install -c fastai fastai

pip show fastai

After Update :
learn.data.test_ds was showing:

LabelList
y: MultiCategoryList (61191 items)
[MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary, MultiCategory haze;primary]…
Path: .
x: ImageItemList (61191 items)
[Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256), Image (3, 256, 256)]…
Path: /home/jupyter/.fastai/data/planet

You can find the notebook here.

Hi! While loading, the test_ds data also always has a y_label as MultiCategory haze;primary. Can you explain why this happens. Shoudln’t it be blank ( no class prediction ? ) initially as we have no present label for the test data.