No need to apologise at all, thanks for the suggestions, it’s always good to get input and try. I don’t have a lot of experience in this field but I have enough to know that tweaking things can make a big difference in unexpected ways, so it seems pretty much anything is worth a try 
I’ve just finessed the notebook and re-run it. With the limited transforms & resize method, not normalising to the resnet weights, and using transfer learning from resnet50, the model is up to ~86% accuracy. A big jump from 79% which I already thought was pretty good.
Thanks to all for the suggestions into what to tweak!
Edit to add, I’m just watching the week 3 video and Jeremy addresses this question directly in the lecture, around the 1:46 mark. He explains that if you’re using pretrained imagenet, you should always use the imagenet stats to normalise, as otherwise you’ll be normalising out the interesting features of your images which imagenet has learned. I see how this makes sense in the example of real-world things in imagenet, but I’m not sure whether this makes sense for ‘synthetic’ images like spectrographs or other image encodings of non-image data. I can imagine what he would suggest though - try it out
I’ll add some extra to the notebook & compare normalising with imagenet stats vs. not, and report back…
Edit to the edit: I added another training phase to that notebook, normalising the data by imagenet’s stats. The result was still pretty good (0.8458 accuracy) but not as good as the self-normed version (0.8646). So it looks like, without really knowing whether this is definitive, it is in fact better to normalise to the dataset itself in the instance of using transfer learning from resnet (trained on real-world images) when trying to classify synthetic images.
Edit 3: I didn’t actually answer your questions! For the spectrograms I’m just using whatever SoX spits out, I don’t know if that’s a melspectrogram or a ‘raw’ one. The y axis seems to be on a linear scale. I’ve seen some things on here that you & others have posted about creating spectrograms and I think I would like to reengineer this “system” to do it all in python without the SoX step. This notebook about speech and this one about composers look good for that. Anyway. I think I’ll jump over to the audio-specific thread! 