Intel Scene classification

The problem is to classify some 25,000 images into 6 classes. The training set is around 17,000 images and the test set is the rest. The leaderboard score was calculated against 30% the test set for now and will be tested against 70% of the test set for the final scores.

I got an accuracy of 0.9557 and was placed 16th in the leaderboard. But, however hard I tried, I could not reach a score higher than that. The 1st place is 0.9666. The strategies I tried are below
Approach 1: Clean images by removing some obviously mis-labelled images. Got score of 0.9543.
Approach 2: Use all original images as they are. Got score of 0.9511. The cleaned images did not give a better score because the test set also contained similar images to those I had removed while cleaning. Therefore, cleaning only made is worse.
Approach 3: I compared and merged the predictions from approaches 1 and 2 and for the entries that were different, I created a third set of predictions by using the predictions from 1 and 2 with a highest probability. For example, if Approach 1 labeled an image as ‘1’ with probability 0.7 and Approach 2 labelled it as ‘2’ with probability 0.9, I chose ‘2’ as the prediction in the merged predictions. This gave me a score of 0.9557 (the highest)
Approach 4: I labelled the test set images using the predictions in Approach 3 and added them to the training set. I thought this would yield me better results but the score actually dropped to 0.805.

For each approach I used resnet34 and resnet50. I also varied the transformations to check whether it would yield better results.

Whatever I tried I could not best the score of Approach 2.

I am new at this and could not think of any other strategy. Any pointers would be very helpful.


That’s great! What approach did you use?


That guy seems to just flaunt off his scores. :stuck_out_tongue: Anyway @joresh, it is quite vivid from the top losses why the model is getting confused:

There is clearly label noise in there.

Yes. Even I was shocked to see this. Is there anything to solve this issue ?? In other words, is there any automated tool or I’ll have to do it manually :((

Here’s my Kaggle kernel for this challenge.

