I built a dataset with photos of four different Airbus commercial airplane models  and I’m struggling to get the error rate below 20%. The best setup I found is
learn.fit_one_cycle(8, max_lr=slice(None, 3e-2, None))
More learning cycles lead to the error rate getting higher again towards the end of training, so this looks like overfitting. A higher max learning rate than 3e-2 makes the error rate go through the roof. With a lower learning rate, the final error rate gets stuck way above 30%.
I had a look at the top losses but they aren’t really conclusive . I also had a look at the learning rate recorder plot, where, strangely enough, the loss hovers at around 0.3 and then shoots up at around 1e-3 . I’m not sure what to make of this.
At this point I’m not quite sure if:
- the dataset isn’t big enough (but I’m already at >150 photos per airplane model)
- the quality of the photos isn’t good enough (but all are at least 224x224)
- there’s something wrong with my Google Cloud instance (I followed the setup guide: GPUs: 1 x NVIDIA Tesla P4, zone: us-west2-b)
- the problem is too difficult (photos are from different airlines and taken from different angles)
- I’m missing something else
Help would be very appreciated. My notebook is here .
Can you show how you’re creating your learner?
Also, are you unfreezing the learner (as in the lesson 1 pets example notebook)?
I think most of what you have said is correct. It is probably both a difficult problem and not enough training data. I know next to nothing about commercial airplanes and by eye I can’t tell the difference between them. The combination of different angles and different airlines does make the problem much harder.
As a general rule, if humans can’t tell the difference computers won’t be able to. This of course isn’t always true, and I’m sure there are people who can discern the differences between these 4 models of AirBus, but would they get under a 24% error rate?
I would try adding more augmentation first (brightness, contrast, cutout, warp, jitter etc) and train over more epochs. If this still doesn’t help I would gather more data and come back with a bigger dataset.
@KayneJ I can tell them apart based on the window pane of the cockpit. But I guess the inconsistent coloring and the more complex shapes make things more difficult than with pet breads and bears.
After playing around with resnet50 some more, I got the error rate down to 15%.
@Tchotchke Basically I duplicated the pet example and repeated each step with my data. See  above
Sorry I missed that - I thought they were all Dropbox links. I’ll try to take a look tomorrow and see if I notice anything
@naoki Another thing you have to think about is what error rate you are aiming for. A 15% error rate over a 4 class problem is roughly equivalent to a 5% error rate on a binary problem.
The reason that the Oxford Pet dataset can get such high accuracy is more likely due to the highly visible differences between classes than the dataset size (~200 photos per class). As such over a harder problem I would expect a lower accuracy. Continue to play around with hyper parameters and augmentation to see if any gains can be made.
If still looking for some added bonuses, try to find a large dataset on plane classification and transfer learn on that image set first. Otherwise I think you are best off finding more images.
I don’t think it helps that some images have multiple objects in them while others do not. With more data external factors outside the target object will have less influence on the network as there is a higher chance they will be cancelled out during the discrimination process however with small data the network is more susceptible to being influenced by imbalances in the data (which are surfaced in the image). An imbalance can manifest itself in many ways but one example is some classes have a higher representation of people in them while others do not. The key is to have a diverse AND balanced set of images to train on.
One approach to clean things up is to localise the object of interest using image segmentation, then put a common background across all images and try train the classifier again.
Another approach is to represent the other objects in the images as classes as well. So instead of just discriminating between different types of planes you might discriminate between images with different types of planes and images with different types of planes and people in them (the other class). However, again balance is key so you might need more data now that you have more classes to discriminate between.