Share your work here ✅

Actually, thinking about this a step further after reading the docs. I’m not sure it will help. In theory, a car noise played (seen) backwards is still going to sound (look) different to a plane noise backwards. It’s more about adding some extra pixel values to the dataset to encourage better generalisation. But that said, none of the images in the validation set - or real life - will ever be transformed; it’s not like a photo where you’re going to get a slightly different angle of a bear. The input data is always going to be a certain orientation.

I don’t know. I’ll try, and see. :slight_smile:

Update - Thanks @MicPie, that suggestion did improve things! I changed the ImageDataBunch parameters to include
ds_tfms=get_transforms(do_flip=False, max_rotate=0.), resize_method=ResizeMethod.SQUISH. Training on resnet50 with 8 epochs and a chosen learning rate resulted in a final error rate of 0.169173, better than the previous ~0.21. So that’s around 83% accuracy, even better than the other SoA sound classification result from @astronomy88.

I’d love to know why this made a difference. Hopefully it will come up in the remaining weeks. Now I’ve watched week 2 - time to serve this model up in a web app…

2 Likes

Hi,
I am having trouble creating my .pkl file for my model. Here is the code -my .pth file is in /content/drive/My Drive/bus_5routes/models/

import pickle

filename = “saved_model_fastai”
outfile = open(filename,‘wb’)
pickle.dump("/content/drive/My Drive/bus_5routes/models/final.pth",outfile)
outfile.close()

infile = open(filename,‘rb’)
model_from_fastai= pickle.load(infile)

learn.load(‘model_from_fastai’)
gives the error -
FileNotFoundError: [Errno 2] No such file or directory: ‘/content/drive/My Drive/bus_5routes/models/model_from_fastai.pth’
losses = model_from_fastai.predict(img)
prediction = losses[0];
prediction


AttributeError Traceback (most recent call last)

<ipython-input-6-66a3f3b93af5> in <module>() ----> 1 losses = model_from_fastai.predict(img) 2 prediction = losses[0]; 3 prediction

AttributeError: ‘str’ object has no attribute ‘predict’

Thanks.

Intresting !!
:sweat_smile::pray: , Glad U liked It , i’ll Soon share the Function That I used , it might be usefull to U as well.
https://colab.research.google.com/drive/1zINYrLPEq1yhawVgmLTS6l2yWNlFfgv0

& here’s from where i got it :

https://stackoverflow.com/questions/44787437/how-to-convert-a-wav-file-to-a-spectrogram-in-python3 (why bother making it from scratch right :smile:)

As to the Data set , it was about 12 Spectograms per whale , so it was as u can see not that big , the reason for that , is , on the internet , u do not get whales sounds of about 3 minutes long or something like that , and even if u do , it is just the same 10 first seconds that keeps repeating , so it was quite difficult , and that’s the reason why i’ve been able to make spectograms for only 8 types of whales & not more ( i have to admit that i’m a bit frustrated by that).
by the way i’ll publish the Notebook soon , if u have any questions feel free to ask , i’ll be delighted to help.

2 Likes

I just started with Deep Learning using your FastAI course and am two lessons down.
Found an interesting dataset on Kaggle on different art forms.
This is my first project here.
It predicts the type of art, whether it is a drawing or a painting or something when fed with an image. I achieved an accuracy of 94% while using FastAI.

Thanks Jeremy. Faced some difficulties while using Kaggle and FastAI together, but managed all those and am proud of this notebook, with many more to come :slight_smile:

2 Likes

Thankyou for writing this out.It has been really helpful.

1 Like

Hey, @MicPie is right, data augmentations are not helpful for spectrograms, neither is pretraining with imagenet.

Try this and see if you can improve even more.

  1. set pretrained=False when creating your cnn_learner (this will turn off transfer learning from imagenet which isn’t helpful since imagenet doesn’t have spectrograms)
learn = cnn_learner(data, base_arch=models.resnet34, 
                   metrics=[accuracy], pretrained=False)
  1. Turn off transforms. You do this in the databunch constructor, set tfms = None
  2. Also make sure you are normalizing for your dataset, not imagenet stats. If you have the line .normalize(imagenet_stats), change it to .normalize().
databunch(dl_tfms=None).normalize()

Hope this helps and if you’re especially interested in audio come join us in the Deep Learning With Audio Thread

4 Likes

I spent my weekend working on improving Stack Roboflow. I updated the display of the generated code so it looks more natural (removed a bunch of whitespace noise that was a result of the tokenizer).

I also linked it up with Elasticsearch (search engine is live here for data exploration!) so I could start understanding what it’s outputting. One of the most interesting things I found was that certain terms from the training data are over-represented and others are under-represented in the language model’s output.

There’s a slight bias towards under-sampling a term vs oversampling it:

After digging in a little bit it seems that terms which are common in both the wikitext dataset and my own training set tend to be over-sampled. And ones that are primarily present in my dataset are under-sampled. My hypothesis is that this has to do with transfer-learning.

For example, most oversampled (weighted by frequency of occurrence are:

file-get-contents
do-loops
dos
do-while
2-way-object-databinding
windows-server-2008-r2
windows-server-2008
get-request
http-get
get
gets
get-childitem
apt-get
windows-server-* (6x)
post-redirect-get

And most under-sampled are

jquery-animate
jquery-pagination
jquery-cookie
jquery-plugins
jquery-jtable
struts2-jquery
jquery-tooltip
jquery-traversing
jquery-autocomplete
jquery-datatables

I noted some more details about my findings in this twitter thread.

I am glad that the writing was helpful for you!

I’m 4 and a half weeks into the course and here are articles on projects I have made so far. Feedback would be highly appreciated.

How do pretrained models work?

Image segmentation

Learning rate and golf

Multi-label classification

@devforfu I checked out the link to the notebook for complete implementation of the Training Loop and it seems empty.Could you please look into that and once again ThankYou for helping me Understand the FastAI Library Better.I am also Interested in implementing a lot of fastai stuff from scratch in Pytorch for better Understanding it and would like to contribute in case you are interested

@at98 Oh, thank you for letting me know! That’s awkward, I didn’t even remember how that happened but you’re right, the most recent commit in the master contains an empty notebook. Here is a link to the commit with the notebook before it was (accidentally, I believe) deleted:


And, here is the repository:

Sure, I would be glad to get any contribution to this little project :slight_smile: It is mostly an educational thing, of course, and super-simple. I am going to continue work on it during Part 2, especially because now we’re going to dig deeper and start work on “low-level” details.

@devforfu I would be happy to contribute in whatever way I can.I am not an expert in Pytorch but I will work to construct some of the stuff from scratch.Did you find any difference in speed while training CIFAR10 using the Pytorch Scheduler that u mentioned vs the fastai method.One thing that i think lacks in fastai is accepting more than 3 channel Input.IN many Competitions I have seen That Input could be upto 20 Channel(DSTL Satellite Image Segmentation).Maybe we can come up with some method to add that.

1 Like

@ThomM, I believe that I have had success in the past using “ds_tfms=None” when working with spectrograms. Though, it looks like you may have been approaching that with so many kwargs!

1 Like

Good work! I have two questions:

  1. Is there a reason you are using stereo and not mono files?
  2. Can we even use transforms on spectrograms since they distort time/frequency connection that is a core feature of a spectrogram?

Thanks! I’m just using the files as presented from the data source, I honestly hadn’t thought about stereo vs. mono. It would be interesting to see if converting the stereo samples to mono would make a difference — my hunch would be that would reveal more about the recording equipment than the actual sound.

As for transforms - I’ve done some experimentation based on what @MicPie and @MadeUpMasters suggested, and it seems that using limited transforms is better than all or none. See this response: Share your work here ✅

Cheers! Per Share your work here ✅ it seems that df_tfms=get_transforms(do_flip=False, max_rotate=0.), resize_method=ResizeMethod.SQUISH produced the best results for me. Better than just using None.

Thanks for this input! I tried out your suggestions, here’s the updated notebook. It looks like using the pre-trained resnet weights definitely does improve the predictions, pretty substantially! And it also seems that using limited transforms is better than none. I haven’t yet done a side by side comparison of normalising with itself vs. against imagenet; I don’t quite understand yet what that step is actually doing or how it’s used, so I have no intuition of why it would or wouldn’t work.

Let me know if you think I did something wrong there! I’ve just taken the most naive approach, and I know nothing about audio processing, so I’m probably making some terrible assumptions :slight_smile: Thanks for the help!

Can you reproduce the same results on consecutive model runs? I am struggling because I am getting different results every time I run the model. I was told to use random seed, but I am wondering does it actually affect how model generalizes, since we want it to generalize well for the new data.

Not exactly! :slight_smile: I probably shouldn’t be making so many assumptions without running the model several times and taking averages.

I don’t know much about that yet & would also love to know more… I know the results can change depending on which data is in your validation set, i.e. if you’re trying to isolate the effects of certain parameters, your validation set should stay the same between runs so you’re testing yourself on the same data. But I also wonder whether that would produce a “good” model if you’re basically overfitting to a specific validation set in that case.

I also know the outputs can vary depending on the initial weights and learning steps, but I don’t have an intuition of how much variance is typical, i.e. whether it’s normal for the same model trained on the same data to vary 1% or 10% between runs.

Hey, I checked out your notebook and you’re completely right. I’m pretty new to audio and it looks like I must have generalized my findings way too broadly, sorry about that. I am doing speech data on spectrograms that look very different from yours, and I’ve consistently found improvements when turning transfer learning off, but your post has encouraged me to go back and play around some more.

About transformations I haven’t experimented enough to really say what I said. I should have done more experimentation on the few types and ranges of transformations that might make sense for spectrograms, so I’ll be messing with that more in the future as well.

Are you using melspectrograms or raw spectrograms? Are your y-axis in log scale? Those are things that are essential in speech but I’m unsure how they affect general sound/noise. Let me know if you know what they are and how to implement them and if not I can point you in the right direction. Cheers.