Model learns about differencies between pickleball and tennis

To test the model of lesson 1 I chose images of pickleball and tennis. If you’re not familiar with either sport, it might be difficult even for a human to differentiate between images of these two sports. I had at first only 5 images of both sports in train and valid. The result after running 5 epochs (running more epochs did not improve the results) with a learning rate of 0.01 was an accuracy of 90%-100% (the accuracy varied on different runs with the same parameters), where the probability for images of tennis was in all cases correctly close to 1. The results for pickleball images were a lot more random, though, with one image incorrectly classified as tennis and others with probabilities between 0.03 and 0.3 correctly classified as pickleball.

To try to improve the accuracy of the model in recognizing pictures of pickleball I added 10 more pickleball pictures to the training data. Somewhat disappointingly, this did not improve the the overall probabilies of the validation images being correctly classified as pickleball. The model produced accuracies between 0.7 and 1 after running it for 5 epochs with the same parameters and the probabilities of validation images being classified as pickleball did not really improve, they just varied somewhat between different runs, with similar results as before.

I then tried a learning rate of 0.001 and ran the model several times for 10 epochs, but that only resulted in generally worse probabilities for images as being classified correctly.

The only major difference I noticed in all of these different runs was which image out of the 5 validation images of pickleball was incorrectly classified as tennis. In all runs it was one or the other of the same two pickleball images, but never both. I can easily understand why these images were difficult to classify. In one image there were only 2 pickleball players on the court (a singles match) while the images of tennis were also all singles matches with only 2 players on the court. In these runs the model seemed to have learned that 2 players on the court implies that tennis is being played.

The other image which was on other runs incorrectly classifies as tennis was an animated image with a view as seen slightly from above the court. All of my images of tennis were TV images which were all recorded from a camera which provided a view from above the court, so it seems to me that in these runs the model was able to learn that a horizontal view angle meant pickleball and a slightly vertical meant tennis.

Are you using data augmentation, including TTA?

I gave this a new go and tried the tricks covered in lesson 1-2 with a new set of pickleball and tennis images (54 training images and 26 validation images). Please find the images and my notebook with comments in

To sum up my findings:

  • It was important to select images which could be clearly (from a human point of view) classified as tennis or pickleball. I got very random results with pictures which showed e.g. a single player. But I guess this was to be expected with my small training set of 54 images in total.
  • I got pretty good results with images selected this way. Out of 26 validation images the model got 1 or 2 wrong.
  • I did not manage to improve the accuracy with data augmentation, unfreezing and finetuning the previous layers or with TTA.

BR, Kimmo

1 Like