I continued working with emotion recognition from speech using the database IEMOCAP but now using four classes (sadness, anger, happiness and neutral) which are the classes with more data. The accuracy dropped to about 64.5 %, which is lower than in this paper from one year ago which achieved up to 68.8 % using convolution layers and LSTMs.
The confusion matrix shows difficulty in classifying the “happiness” category:
I believe there’s room for improvement using data augmentation, which should be performed directly on the waveform, before spectrogram calculation.
Ablility to run different version of models without unnecassary code changes
User friendly API documentations
Input data validation
Inference device flexibility (CPU or GPU)
Scale up or down instances based on incoming traffic
Feedback is welcome. Hope someone will find it useful.
Thanks a lot @MicPie . Now I will continue with my old approach first and move to autoencoder if it does not work out . I always want to contribute to fast.ai so if I decide to work with autoencoder, I will definitely join you.
One application of this can be in Hotel review websites like Zomato, which categorizes user uploaded pictures as indoor, outdoor ambience and food.
Most probably they are already using similar ML technique.
Inspired by fast.ai Deploying web app to Zeit Production guide, I have created an updated starter packs that supports AWS Beanstalk, Google App Engine, also. Plus I have updated statics file with additional CSS & JS to handle big camera file uploads. Plus, These is started package for Keras image models also.
Then I also wrote a detailed guide to build & deploy these starter pack as a web app on 4 Cloud services, including AWS Beanstalk, Google App Engine, (Plus Azure Website & now.sh).
I hope to keep updating this guide with other Docker hosted Cloud web app services like Digital Ocean, Heroku, etc.
Plus more starter packs for NLP, Text or Collab Filtering.
Oh and This article is picked up by Towards Data Science Publication, so please let me know, your feedback, comments and thoughts here.
Download your starter pack app repository for Fast.ai here:
After some effort, I got it partially working. It reached 92.1% accuracy (0.3% bellow my previous model). Some remarks:
I think there is some bug in my code, since when I try to run fit_one_cycle, after unfreezing and lr_find I get a “can’t optimize a non-leaf Tensor” error (it only happen when unfreezing and running lr_find before training). I wasn’t able to find out what is causing it.
I only created two groups layers: the first containing the tabular and nlp models and the second containing the last linear layers. So, it is leveraging just partially on discriminative learning rates, using the same LR for all the layers inside tabular and nlp models. I think this can be specially harmful for the nlp model, updating the weights aggressively in the early layers, destroying part of the pre-trained weights.
Edited: I managed to set the layer_groups properly. In addition to minor tweaks (increase wd and dropout), I reached 92.3%).
I’d love to get some feedback and possible improvements.
Yet another update on my satellite project. Yesterday’s lecture about PCA got me thinking about latent space representations for urban characteristics of cities as seen from space.
I tried doing PCA on the later layer’s vector representations of cities, but the results were a little disappointing, so in the end I used U-MAP to do my dimension reduction. It’s much faster than T-SNE and I thought the results were pretty cool.
I also flattened the U-MAP representation to a grid using the lapjv python package which finds the grid representation of distance maps using some fancy algorithm.