Actually, thinking about this a step further after reading the docs. I’m not sure it will help. In theory, a car noise played (seen) backwards is still going to sound (look) different to a plane noise backwards. It’s more about adding some extra pixel values to the dataset to encourage better generalisation. But that said, none of the images in the validation set - or real life - will ever be transformed; it’s not like a photo where you’re going to get a slightly different angle of a bear. The input data is always going to be a certain orientation.
I don’t know. I’ll try, and see.
Update - Thanks @MicPie, that suggestion did improve things! I changed the
ImageDataBunch parameters to include
ds_tfms=get_transforms(do_flip=False, max_rotate=0.), resize_method=ResizeMethod.SQUISH. Training on resnet50 with 8 epochs and a chosen learning rate resulted in a final error rate of 0.169173, better than the previous ~0.21. So that’s around 83% accuracy, even better than the other SoA sound classification result from @astronomy88.
I’d love to know why this made a difference. Hopefully it will come up in the remaining weeks. Now I’ve watched week 2 - time to serve this model up in a web app…