Hey, @MicPie is right, data augmentations are not helpful for spectrograms, neither is pretraining with imagenet.
Try this and see if you can improve even more.
- set
pretrained=False
when creating your cnn_learner (this will turn off transfer learning from imagenet which isn’t helpful since imagenet doesn’t have spectrograms)
learn = cnn_learner(data, base_arch=models.resnet34,
metrics=[accuracy], pretrained=False)
- Turn off transforms. You do this in the databunch constructor, set
tfms = None
- Also make sure you are normalizing for your dataset, not imagenet stats. If you have the line
.normalize(imagenet_stats)
, change it to.normalize()
.
databunch(dl_tfms=None).normalize()
Hope this helps and if you’re especially interested in audio come join us in the Deep Learning With Audio Thread