Intel Scene classification

Attempted a competition at Analytics Vidhya site. See details of competition and notebook at
https://github.com/suresh-subramaniam/real-life-projects/blob/master/intel_scene_classification.pdf

The problem is to classify some 25,000 images into 6 classes. The training set is around 17,000 images and the test set is the rest. The leaderboard score was calculated against 30% the test set for now and will be tested against 70% of the test set for the final scores.

I got an accuracy of 0.9557 and was placed 16th in the leaderboard. But, however hard I tried, I could not reach a score higher than that. The 1st place is 0.9666. The strategies I tried are below
Approach 1: Clean images by removing some obviously mis-labelled images. Got score of 0.9543.
Approach 2: Use all original images as they are. Got score of 0.9511. The cleaned images did not give a better score because the test set also contained similar images to those I had removed while cleaning. Therefore, cleaning only made is worse.
Approach 3: I compared and merged the predictions from approaches 1 and 2 and for the entries that were different, I created a third set of predictions by using the predictions from 1 and 2 with a highest probability. For example, if Approach 1 labeled an image as ‘1’ with probability 0.7 and Approach 2 labelled it as ‘2’ with probability 0.9, I chose ‘2’ as the prediction in the merged predictions. This gave me a score of 0.9557 (the highest)
Approach 4: I labelled the test set images using the predictions in Approach 3 and added them to the training set. I thought this would yield me better results but the score actually dropped to 0.805.

For each approach I used resnet34 and resnet50. I also varied the transformations to check whether it would yield better results.

Whatever I tried I could not best the score of Approach 2.

I am new at this and could not think of any other strategy. Any pointers would be very helpful.

2 Likes

Bro I’m on 3rd position 8n that leaderborad
Username :- rishi34
Accuracy :- 96.52968037

That’s great! What approach did you use?

2 Likes

That guy seems to just flaunt off his scores. :stuck_out_tongue: Anyway @joresh, it is quite vivid from the top losses why the model is getting confused:
image

There is clearly label noise in there.

Yes. Even I was shocked to see this. Is there anything to solve this issue ?? In other words, is there any automated tool or I’ll have to do it manually :((

Here’s my Kaggle kernel for this challenge.

https://www.kaggle.com/umangjpatel/intel-image-classification-using-fast-ai/notebook

1 Like