Planet Classification Challenge

Creating a thread similar to Dog Breed Identification Challenge to discuss Planet Classification Challenge and any insights, roadblocks, or scores. Since this one is expired I think we should be able to share code here as well possibly?

Kaggle link:


Yup can share! :slight_smile:


Almost done with the planet kaggle competition… but kinda stuck at submission file generation.

One question: we agreed that we were going to use the sigmoid function for generating output values on the very end. Now these values are not going to be normalized. So, for one sample image learn.TTA(is_test=True) might look something like:

[ 0.29374, 0.00393, 0.00965, 0.00235, 0.00266, 0.01863, 0.05588, 0.00071, 0.04019, 0.01701,
0.00452, 0.88296, 0.93706, 0.15011, 0.00272, 0.00258, 0.09983],

How do we know what value we should use to determine if one of the 17 categories is predicted for this image using the numbers above?

I say so, because I looked at the sample_submission.csv file, and it looks as follows:

file_9973,primary clear agriculture road water
file_9974,primary clear agriculture road water
file_9975,primary clear agriculture road water
file_9976,primary clear agriculture road water
file_9977,primary clear agriculture road water
file_9978,primary clear agriculture road water
file_9979,primary clear agriculture road water

Sorry… haven’t had the time to do more research. It’s 11 PM and my sleep-deprived (surprise, surprise!) brain is ready to call it a day.


Sample_submission just does the same value for everything as a template of what the submission file will look like. For yours I am guessing you will look at the ones that are greater than 0.5 so in this case:

[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]

And then you will match those ones up to the value they represent so in this case it could be clear and primary or hazy agriculture.

1 Like

For this competition and for most multi-label problems like this, you’ll need to choose an optimal “threshold” so anything above it is considered and anything below it is not. Luckily the fastai library has a function called opt_th that actually finds the threshold for us! You can find it in the file

Like Kevin mentioned you’ll need to make sure your predictions are matched up with the right labels since the submission file actually is looking for the label names not the prediction values.


Wow, that is awesome. I was planning on using 0.5, but something tells me this might (100% definitely) be a better method.

Yea lot of people used 0.2 as a good fixed threshold for all labels in this comp, 0.5 would be too high I think. Using opt_th of course should be even better as the optimal threshold shouldn’t be the same value for all labels.



(Preamble: There’s an issue with using the planet dataset that’s being discussed here. There’s also a way to side-step the exception-trace as long as you’re not producing the submission file!

Got my first submission in. At 0.8898 on the leaderboard, think it’s fairly okay. I barely fit the model once, and focused more on getting the submission file done. Here’s my lousy script that produces the file. Nothing fancy, and very java-ish like. I couldn’t, for the sake of good life, get one of them fancy python generators working.

for i in range(17):
    result = data.classes[i]
    mapp[i] = result

so, that mapp is:

{0: 'agriculture',
 1: 'artisinal_mine',
 2: 'bare_ground',
 3: 'blooming',
 4: 'blow_down',
 5: 'clear',
 6: 'cloudy',
 7: 'conventional_mine',
 8: 'cultivation',
 9: 'habitation',
 10: 'haze',
 11: 'partly_cloudy',
 12: 'primary',
 13: 'road',
 14: 'selective_logging',
 15: 'slash_burn',
 16: 'water'}

The script for producing the test thus would be:

import re
tta_test = learn.TTA(is_test=True)
files =
predictions = tta_test[0]
pattern = 'test-jpg\/(.*)\.jpg'
files = list(

with open("planet_submission.csv.apil","w") as f:
    for i in tqdm(range(61191)):
        # files[i] is of this form: test-jpg/test_xyze.jpg
        # Only want to extract the 'text_xyze' part for the
        # submission
        pattern ='test-jpg\/(.*)\.jpg',files[i])        
        predLine =','
        prediction = predictions[i]

        for j in range(17):
            # only use the prediction if
            # score is greater than 0.2
            if(prediction[j] > 0.2):
                predLine +=mapp[j]+' '

Now, all I need to do is figure out how to leverage the op_th function! Hmmm… Wife’s really mad I’m doing nothing but coding since I got home. Gotta go to bed !!


You can do that after the course finishes :wink:

Thanks for sharing that code. But gosh yes it is java-ish - we’ll try to help fix it up a little…


Dear Python/PyTorch gurus out there,

Is this just me or is this something most people can reproduce? I ran it on an AWS instance created by a script from version 1 of this course and also on a new one that’s created via the new part 1 v2 AMI and saw the same result. Also, I downloaded data from Kaggle on each instance rather than copying them over.

I did “git pull” and “conda update --all”. But, no module named 'fast_gen" in planet_cv.ipynb

@Moody, might be an issue of where you are running the notebook from, as often this error happens when the fastai directory is not where the notebooks expects to find it, based on your current working directory

Yeah we haven’t covered planet_cv yet, so it’s not in working condition. You should change the imports to be the same as our lesson1 and lesson2 notebooks. Probably some other changes needed too.

So it looks like that opt_th is for the validation set, once you are moving from validation to test, how do you determine what th you should use?

I haven’t tested it extensively yet but I think we have to just go with the thresholds based on how they did on the validation set, since we don’t really know in advance how they will perform on the test set…However, if you were to retrain with all of the data have you checked to see if opt_th still works? Cause that would probably be an even better indication of which thresholds to use…Looking at the code now…for the targs parameter couldn’t you just feed it with the y values from the whole dataset (training with all images - no validation set) ?

Yeah, you could figure out the optimal threshold on all of the known values, So you would just do the actuals vs predicted of all the training images and that would give you the threshold and then you would just have to put that into your testing threshold and hope it is the same. I think for my first submission I’m just going to use 0.2. That can be v2 for me.

this may help for the predictions

prob_preds, y= learn.TTA(is_test=True)
classes = np.array(data.classes, dtype=str)
res = [" ".join(classes[np.where(pp > 0.5)]) for pp in prob_preds] 
test_fnames = [os.path.basename(f).split(".")[0] for f in data.test_ds.fnames]
test_df = pd.DataFrame(res, index=test_fnames, columns=['tags'])
test_df.to_csv('planet-amazon-from-the-space_Deb.csv', index_label='image_name')

That is such an elegant way to handle that. I have been hacking together a solution for the past hour and this just got me my last piece I needed. Especially liked this line:

res = [" ".join(classes[np.where(pp > 0.5)]) for pp in prob_preds] 

Short, but there is a lot going on there and it is really powerful.

You will probably want to lower that 0.5 to somewhere around 0.2 is the only change I would recommend based on other conversations.

I was able to get a 0.93045 score on my first submission. Pretty happy with that as a starting point. That would put me around 47th when the competition was going on. My next step is going to be recreating without looking at other code snippets (or at least keeping that to a minimum).


Umm why you trying to be so modest Kevin, thats a crazy good score for this competition!! LOL