Share your work here ✅

Hi, everyone

I continued working with emotion recognition from speech using the database IEMOCAP but now using four classes (sadness, anger, happiness and neutral) which are the classes with more data. The accuracy dropped to about 64.5 %, which is lower than in this paper from one year ago which achieved up to 68.8 % using convolution layers and LSTMs.

The confusion matrix shows difficulty in classifying the “happiness” category:

29

I believe there’s room for improvement using data augmentation, which should be performed directly on the waveform, before spectrogram calculation.

Cheers

3 Likes

The problem with synthetic data is that you will have training and test datasets from different domains. This requires, usually, a domain adaptation approach.

I have made an API based application to serve models in somewhat production settings.

Repo: https://github.com/gurvindersingh/mlapp

Provides following features:

Versioned APIs
Metrics (prometheus)
Ablility to run different version of models without unnecassary code changes
User friendly API documentations
Input data validation
Authentication
Inference device flexibility (CPU or GPU)
Scale up or down instances based on incoming traffic

Feedback is welcome. Hope someone will find it useful.

18 Likes

Thanks a lot @MicPie . Now I will continue with my old approach first and move to autoencoder if it does not work out :smiley: . I always want to contribute to fast.ai so if I decide to work with autoencoder, I will definitely join you.

Regards,
Hoa

Interesting! Thanks for sharing :slight_smile:

Will be interested to hear feedback from anyone who tries this out.

6 Likes

First we do the conversion to *.wav files from mp3

from pydub import AudioSegment
print("Converting...")
arr = [2,3,4,5]
for x in arr:
    sound = AudioSegment.from_mp3("{0}.mp3".format(x))
    sound = sound.set_channels(1)
    sound.export("{0}[WAV].wav".format(x), format="wav")
print("Converted.")    

or Even this

from pydub import AudioSegment
sound = AudioSegment.from_mp3("KY.mp3")
sound.export("KY.wav", format="wav")

And then from *.wav to Spectograms

for file in *.wav;do
    outfile="${file%.*}.png"
    title_in_pic="${file%.*}"
    sox "$file" -n spectrogram -t "$title_in_pic" -o "$outfile" -x 2000
done

My friend had used this bash tool and this is a small script from his partially completed project… (There’s Librosa Module also, you can have a look…)

PS These aren’t mine(belongs to my friend); I have very little knowledge wrt music

Edit Added Working Demonstration -

  • Download this wav file and run the above sox command to get this Spectogram…

3 Likes

I created an Age Predictor App :smiley:

Basically used a pretrained Resnet-50 and trained it on IMDB-Wiki dataset (details in Github repo).

45%20PM

Maybe I am just being dumb but I am still not quite sure how to interpret the valid loss (MSELoss). It seems high but my predictions are not that far off…

Github: https://github.com/btahir/age-detector

Also gave a face lift to the Zeit front-end which you can see in the app. Feel free to use the index.html/style.css files for your projects.

You can check out the app here: https://age-predictor.now.sh/

14 Likes

Great work.
One application of this can be in Hotel review websites like Zomato, which categorizes user uploaded pictures as indoor, outdoor ambience and food.
Most probably they are already using similar ML technique.

1 Like

Can you share your notebook?

You’ve just made enemies with half of the world’s population.

6 Likes

Hi All,

Inspired by fast.ai Deploying web app to Zeit Production guide, I have created an updated starter packs that supports AWS Beanstalk, Google App Engine, also. Plus I have updated statics file with additional CSS & JS to handle big camera file uploads. Plus, These is started package for Keras image models also.

Then I also wrote a detailed guide to build & deploy these starter pack as a web app on 4 Cloud services, including AWS Beanstalk, Google App Engine, (Plus Azure Website & now.sh).

I hope to keep updating this guide with other Docker hosted Cloud web app services like Digital Ocean, Heroku, etc.
Plus more starter packs for NLP, Text or Collab Filtering.

Oh and This article is picked up by Towards Data Science Publication, so please let me know, your feedback, comments and thoughts here.

Download your starter pack app repository for Fast.ai here:

git clone https://github.com/pankymathur/fastai-vision-app

Download your starter pack app repository for Keras here:

git clone https://github.com/pankymathur/fastai-vision-app

Please do let me know your questions & feedback.

Thanks,
Pankaj

30 Likes

Wonderful work… Pankaj :grinning:

1 Like

Thank You Vishal, let me know, if you face any issue during deployment.

1 Like

Yeah sure…!

Thanks for this. I will try this soon…

1 Like

After some effort, I got it partially working. It reached 92.1% accuracy (0.3% bellow my previous model). Some remarks:

  • I think there is some bug in my code, since when I try to run fit_one_cycle, after unfreezing and lr_find I get a “can’t optimize a non-leaf Tensor” error (it only happen when unfreezing and running lr_find before training). I wasn’t able to find out what is causing it.
  • I only created two groups layers: the first containing the tabular and nlp models and the second containing the last linear layers. So, it is leveraging just partially on discriminative learning rates, using the same LR for all the layers inside tabular and nlp models. I think this can be specially harmful for the nlp model, updating the weights aggressively in the early layers, destroying part of the pre-trained weights.

Edited: I managed to set the layer_groups properly. In addition to minor tweaks (increase wd and dropout), I reached 92.3%).

I’d love to get some feedback and possible improvements.

4 Likes

Hi All,
Yet another update on my satellite project. Yesterday’s lecture about PCA got me thinking about latent space representations for urban characteristics of cities as seen from space.

I tried doing PCA on the later layer’s vector representations of cities, but the results were a little disappointing, so in the end I used U-MAP to do my dimension reduction. It’s much faster than T-SNE and I thought the results were pretty cool.
I also flattened the U-MAP representation to a grid using the lapjv python package which finds the grid representation of distance maps using some fancy algorithm.

Here is the result:

The actual image is 80MB so here are some interesting higher res areas:

Ochre roofs and twisty roads:
image

Big backyards:
image

Dense and arid:
image

Grids:
image

You can check the notebook out here

30 Likes

Very interesting!
Thanks for sharing the notebook and the data plus the scrapper as well
PS 37MB Jupyter Notebook!
Biggest of all I have seen till date

Nbviewer Link Of the Same Notebook https://nbviewer.jupyter.org/github/henripal/maps/blob/master/nbs/big_resnet50-pca.ipynb

Using this we can now detect Roof Top Swimming Pools maybe…
Just saying!

1 Like

Sure Amit, let me know if you face any issue during deployment.

@flavioavila particularly interested in how you did the preprocessing/padding etc