Thanks for the encouragement and I hope you are right
I have seen situations like this before with other kaggle comps and it usually results in a big private lb shakeup since a lot of people could be overfitting to the public lb. Trusting local cv is important in this case, but I also think I need to run a lot more experiments to be sure.
What kind of loss rates is everyone else getting on local cv? Are you all seeing the same discrepancy between validation vs test set loss rates?
I was looking through some of the notebooks shared for iceberg competition and one thing I noticed is that the ids are not always aligned correctly with the test predictions. This is super important and for this particular competition its a little more tricky since the submission ids arenāt the same as the test image ids which are usually named automatically by their index.
Here is what I used to align the test ids with the test predictions for iceberg competition. Of course, anyone can correct me if Iām wrong about this!
test = pd.read_json(f'{PATH}test.json')
test_preds = np.exp(learn.TTA(is_test=True)[0])[:, 0]
test_idxs = [i.split('.jpg')[0].split('/')[-1] for i in data.test_dl.dataset.fnames]
test_ids_json = test['id']
test_ids = []
for i in test_idxs:
test_ids.append(test_ids_json[int(i)])
test_set = pd.read_csv('data/iceberg/sample_submission.csv')
test_set['id'] = test_ids
test_set['is_iceberg'] = test_preds
Great thanks! I was actually providing this code example for the iceberg competition as it is slightly different (and a bit more tricky) than the dog breed comp in which the test ids are provided in json and they need to be paired up with the correct test img idxs.
Because you are submitting without ordering. Kaggle expects you to submit with the same order of idās and there is a format given in competition page you can check it out.
For example format should be something like;
id, is_iceberg
dasd, 0.6
kjdks, 0.7
ā¦
In order to align preds and test submission you need to know either the index of the predicted file in test data or something else to align it correctly. What I do is, I save test data like ā{index}.jpgā so that later I can extract index and put it back to the desired order.
Hope this helps
btw: you can access fnames of predicted images from:
I havenāt tried predicting with 64. I downloaded planet data myself maybe there is an issue with folder arrangments. Because I also had to add additional test-jpg in order to submit.
Like Kerem mentioned, it looks like your test preds arenāt properly aligned with the test ids. I provided some sample code below which will take care of all of that for you specifically for the iceberg competition. Hope it helps!