Lesson 2 planet amazon labelling

I am trying to load in the test data to try and generate predictions.

Do I use the ImageMultiDataset to do this?

Just a reminder to use the advanced category for stuff not yet covered in class. I’ve moved it for you.

np.random.seed(13)
split_data = (ImageFileList.from_folder(path)            
        .label_from_csv('train_v2.csv', sep=' ', folder='train-jpg', suffix='.jpg')  
        .random_split_by_pct(0.2)
        .datasets(ImageMultiDataset))
test_ds = ImageMultiDataset.from_single_folder(path / 'test-jpg-additional', [''])
split_data.test_ds = test_ds
data = (split_data
    .transform(tfms, size=128)
    .databunch()
    .normalize(imagenet_stats))

This is the only way I managed to get it working.

4 Likes

Awesome, was looking for this!

I can’t get the planet dataset - I posted in Paperspace topic:

https://forums.fast.ai/t/platform-paperspace-and-gradient/27338/105?u=ricknta

…and when I search Kaggle it doesn’t come up. Has anyone been able to find it?

I had a similar problem with 403- Forbidden. I solved it by joining the competition on Kaggle and then clicking the Rules tab in Kaggle competition – at the bottom of that, you need to “accept” their terms for the dataset. Then it worked.

1 Like

Hello guys. I’m also working on this dataset. Would like to share data preparation snipet:

Edited: this part is broken after version 1.0.21
data = ImageDataBunch.from_csv(path=path, folder='train-jpg', sep=' ', csv_labels='train_v2.csv', suffix='.jpg', test='test-jpg', ds_tfms=tfms, size=128)

data.normalize(imagenet_stats)

May be obvious, but it is also worth noting, that if you get the data from kaggle and working on jpeg images, you need to sync test-jpg and test-jpg-additional directories. It should be done to get all needed test files.
The nice way of doing this is:

rsync -aP test-jpg-additional/ test-jpg/

Have a great kaggling and fastaiing))

1 Like

@whatrocks you were absolutely right - thanks! I had joined but failed to scroll to the bottom of the Rules tab and accept the terms.

1 Like

Hello, @jeremy. Sorry, if bothering you. I would like to suggest minor changes to planet notebook. As I recently trying to make a late submission I came across to the fact they have two test archives. So to have correct number of files to predict and submit you need to join them.
In my case i need to add following code:

! kaggle competitions download -c planet-understanding-the-amazon-from-space -f test-jpg-additional.tar.7z -p {path}

! 7za -bd -y x {path}/test-jpg-additional.tar.7z -o{path}
! tar -xf {path}/test-jpg-additional.tar -C {path}

! rsync -aP {path}test-jpg-additional/ {path}test-jpg/

1 Like

Thanks for the reminder.

my pleasure)