Problem to create Kaggle submission


(Nadine) #1

Hey guys,

I’m experiencing problems when I try to create my submission for a Kaggle competition.

I copied the Lesson 1 notebook and trained a learner which performed nicely on the validation set (.95 accuracy).

Then, I do the following to create a submission to a competition where you have to predict the classes of pictures (e.g. Plant Seedlings Classification).

probs = learn.predict(is_test = True)
preds = np.argmax(probs, axis = 1)
sub = [data.classes[pred] for pred in preds]
sample_sub = pd.read_csv(PATH + ‘sample_submission.csv’)
sample_sub[‘species’] = sub
sample_sub.to_csv(PATH+‘subm.csv’, index = False)

Can anyone see what I’m doing wrong? I end up with random noise, only <.1 accuracy on the leaderboard :frowning:

Thanks a lot,
Nadine


#2

It looks like the filenames are not being added to the submission.

Perhaps changing the submission to:

fnames_nopath = [fname[5:] for fname in data.test_ds.fnames]
sample_sub = pd.read_csv(PATH + ‘sample_submission.csv’)
sample_sub['file'] = fnames_nopath
sample_sub[‘species’] = sub
sample_sub.to_csv(PATH+‘subm.csv’, index = False)

My method was somewhat verbose

log_preds, y = learn.TTA(is_test=True)
mean_logpreds = np.mean(probs, 0)
max_preds = np.argmax(mean_logpreds, 1)
class_preds = [data.classes[index_pred] for index_pred in max_preds]
fnames_nopath = [fname[5:] for fname in data.test_ds.fnames]
fname_preds = list(zip(fnames_nopath, class_preds))
df = pd.DataFrame(fname_preds)
df.columns = ["file", "species"]
df.to_csv(f'{PATH}subm.csv', index=False)

This worked for me.

Andrew Smedley


Wiki: Lesson 1
(Nadine) #3

Thanks a lot for your reply, that did the trick and makes me happy :-))