Share your work here ✅

@at98 Oh, thank you for letting me know! That’s awkward, I didn’t even remember how that happened but you’re right, the most recent commit in the master contains an empty notebook. Here is a link to the commit with the notebook before it was (accidentally, I believe) deleted:

And, here is the repository:

Sure, I would be glad to get any contribution to this little project :slight_smile: It is mostly an educational thing, of course, and super-simple. I am going to continue work on it during Part 2, especially because now we’re going to dig deeper and start work on “low-level” details.

@devforfu I would be happy to contribute in whatever way I can.I am not an expert in Pytorch but I will work to construct some of the stuff from scratch.Did you find any difference in speed while training CIFAR10 using the Pytorch Scheduler that u mentioned vs the fastai method.One thing that i think lacks in fastai is accepting more than 3 channel Input.IN many Competitions I have seen That Input could be upto 20 Channel(DSTL Satellite Image Segmentation).Maybe we can come up with some method to add that.

1 Like

@ThomM, I believe that I have had success in the past using “ds_tfms=None” when working with spectrograms. Though, it looks like you may have been approaching that with so many kwargs!

1 Like

Good work! I have two questions:

  1. Is there a reason you are using stereo and not mono files?
  2. Can we even use transforms on spectrograms since they distort time/frequency connection that is a core feature of a spectrogram?

Thanks! I’m just using the files as presented from the data source, I honestly hadn’t thought about stereo vs. mono. It would be interesting to see if converting the stereo samples to mono would make a difference — my hunch would be that would reveal more about the recording equipment than the actual sound.

As for transforms - I’ve done some experimentation based on what @MicPie and @MadeUpMasters suggested, and it seems that using limited transforms is better than all or none. See this response: Share your work here ✅

Cheers! Per Share your work here ✅ it seems that df_tfms=get_transforms(do_flip=False, max_rotate=0.), resize_method=ResizeMethod.SQUISH produced the best results for me. Better than just using None.

Thanks for this input! I tried out your suggestions, here’s the updated notebook. It looks like using the pre-trained resnet weights definitely does improve the predictions, pretty substantially! And it also seems that using limited transforms is better than none. I haven’t yet done a side by side comparison of normalising with itself vs. against imagenet; I don’t quite understand yet what that step is actually doing or how it’s used, so I have no intuition of why it would or wouldn’t work.

Let me know if you think I did something wrong there! I’ve just taken the most naive approach, and I know nothing about audio processing, so I’m probably making some terrible assumptions :slight_smile: Thanks for the help!

Can you reproduce the same results on consecutive model runs? I am struggling because I am getting different results every time I run the model. I was told to use random seed, but I am wondering does it actually affect how model generalizes, since we want it to generalize well for the new data.

Not exactly! :slight_smile: I probably shouldn’t be making so many assumptions without running the model several times and taking averages.

I don’t know much about that yet & would also love to know more… I know the results can change depending on which data is in your validation set, i.e. if you’re trying to isolate the effects of certain parameters, your validation set should stay the same between runs so you’re testing yourself on the same data. But I also wonder whether that would produce a “good” model if you’re basically overfitting to a specific validation set in that case.

I also know the outputs can vary depending on the initial weights and learning steps, but I don’t have an intuition of how much variance is typical, i.e. whether it’s normal for the same model trained on the same data to vary 1% or 10% between runs.

Hey, I checked out your notebook and you’re completely right. I’m pretty new to audio and it looks like I must have generalized my findings way too broadly, sorry about that. I am doing speech data on spectrograms that look very different from yours, and I’ve consistently found improvements when turning transfer learning off, but your post has encouraged me to go back and play around some more.

About transformations I haven’t experimented enough to really say what I said. I should have done more experimentation on the few types and ranges of transformations that might make sense for spectrograms, so I’ll be messing with that more in the future as well.

Are you using melspectrograms or raw spectrograms? Are your y-axis in log scale? Those are things that are essential in speech but I’m unsure how they affect general sound/noise. Let me know if you know what they are and how to implement them and if not I can point you in the right direction. Cheers.

[ 25/03/2019 - EDIT with link to part 2 ]

Hi. I just published my medium post + jupyter notebook about the MURA competition (see my post here).

I used the standard fastai way of images classification (cf lesson 1 and 2).

Feedbacks welcome :slight_smile:

Part 2 of our journey in Deep Learning for medical images with the fastai framework on the MURA dataset. We got a better kappa score but we need radiologists to go even further. Thanks to jeremy for his encouragement to persevere in our research :slight_smile:


No need to apologise at all, thanks for the suggestions, it’s always good to get input and try. I don’t have a lot of experience in this field but I have enough to know that tweaking things can make a big difference in unexpected ways, so it seems pretty much anything is worth a try :slight_smile:

I’ve just finessed the notebook and re-run it. With the limited transforms & resize method, not normalising to the resnet weights, and using transfer learning from resnet50, the model is up to ~86% accuracy. A big jump from 79% which I already thought was pretty good.

Thanks to all for the suggestions into what to tweak!

Edit to add, I’m just watching the week 3 video and Jeremy addresses this question directly in the lecture, around the 1:46 mark. He explains that if you’re using pretrained imagenet, you should always use the imagenet stats to normalise, as otherwise you’ll be normalising out the interesting features of your images which imagenet has learned. I see how this makes sense in the example of real-world things in imagenet, but I’m not sure whether this makes sense for ‘synthetic’ images like spectrographs or other image encodings of non-image data. I can imagine what he would suggest though - try it out :slight_smile: I’ll add some extra to the notebook & compare normalising with imagenet stats vs. not, and report back…

Edit to the edit: I added another training phase to that notebook, normalising the data by imagenet’s stats. The result was still pretty good (0.8458 accuracy) but not as good as the self-normed version (0.8646). So it looks like, without really knowing whether this is definitive, it is in fact better to normalise to the dataset itself in the instance of using transfer learning from resnet (trained on real-world images) when trying to classify synthetic images.

Edit 3: I didn’t actually answer your questions! For the spectrograms I’m just using whatever SoX spits out, I don’t know if that’s a melspectrogram or a ‘raw’ one. The y axis seems to be on a linear scale. I’ve seen some things on here that you & others have posted about creating spectrograms and I think I would like to reengineer this “system” to do it all in python without the SoX step. This notebook about speech and this one about composers look good for that. Anyway. I think I’ll jump over to the audio-specific thread! :loudspeaker:

1 Like

Just finished Kaggle’s Microsoft Malware Prediction and posted it on linkedin. This was a very humbling competition for me and I am happy for all the lessons I learned.

Or medium if you prefer


I took Rachel and Jeremy’s advice and started blogging. Here is my first guide on how to best make use of fastai’s built in parallel function to speed up your code. I hope this helps someone. Feedback, especially critical, is appreciated!



This is Sarvesh Dubey. Actually, I and my team had been working on a project which had to classify the severity of Alzheimer in Brain Scan. So we had first made a custom Dataset on the classes of Alzheimer severity. Initially, we were using Pytorch for the coding part and had applied Transfer Learning through Pytorch and could get a maximum accuracy of 75.6 %. But after that, we started seeing about FastAI and built a very much accurate model of 90% using Densenet161 and probably this classifier has not been built yet and with this accuracy
( Don’t know exactly). We are thinking of writing a research paper regarding this. As it’s a custom Dataset.
Now thinking of going for production for this model


@at98 I am not an expert as well :smile: Actually, PyTorch scheduler is nothing more than a kind of callback with some specific methods. Therefore, I don’t think there should be any different from the implementation of fastai. (I believe that the later is even more robust and flexible).

Also, I think that having > 3 channels in the input is a problem of architecture, and not the library itself. Like, most of the networks are pre-trained on the ImageNet where you have 3 channels only. Therefore, to support a higher number of channels, one should replace the input layer in the created model with something that accepts a required number of channels.

Of course, I would be glad if you would like to share any generic snippets or code you use for your projects. I guess that could be a good idea to gather interesting scripts and solutions from DL practitioners.

Hello everyone,

I was having a very difficult time organizing my butterfly images in Desktop. So, I trained a Fastai butterfly image classifier which reached a high accuracy of 95.2% with Resnet50. After training, I classified the images in the unclassified folder and used a few python file operations to move the correctly classified images to their respective class directories. This small task has saved me immense time in organizing my butterfly images.

I am very happy to share my workflow via Medium article as given below.

Classifying and organizing butterfly images in the desktop using FastAI and Python

Thank you @jeremy and @rachel for making my photo organizing work pretty easy with your fastai library and lucid lectures.

Cool to see someone do the equivalent of “whip up a shell script” but “whip up a deep neural net plus a shell script” to solve a problem like this, nice!

Hi, I tried playing around with the notebook from lesson 1 as Jeremy said in course so I created mine.
I had some struggle but I want some feedback on what was right and what was wrong,… ? and Thank you
Github repo link

Couldn’t resist following up one more time, as I made one other minor tweak to the sound effect classifier model - changing the weight decay to 0.1 - and got the accuracy up to 0.906015! So, there you go - iterate, iterate, iterate…

1 Like