Facial Expression Action Units Emotions Recognition - project


I want to build a recognition model either for Action Units or for Emotions.
At the moment I use Action Units (“the fundamental actions of individual muscles or groups of muscles as can be see here”)

The current dataset is emotioNet

This dataset provides the following manually annotated AUs 1,2,4,5,6,9,12,17,20,25,26 for 25.000 images.

The AUs distribution varies a lot:

Aus HowManyTimes Description
1 1562 Inner Brow Raiser
2 753 Outer Brow Raiser
4 3292 Brow Lower
5 951 Upper Lid Raiser
6 4933 Cheek Raiser
9 570 Nose Wrinkle
12 7869 Lip Corner Puller
17 521 Chin Raiser
20 150 Lip Stretcher
25 12358 Lips Part
26 2209 Jaw drop

First time, I tried to take a random sample of 200 images (with all 11 AUs) without checking the distribution - the results were poor, they were super unbalanced.

Finally I said to keep it simple, so I created a model with only two classes - which have either one AU: AU6 (cheek raiser) either with AU12 (lip corner puller). So it resulted in 150 samples of AU6 and 190 samples of AU12. I start small in order to try to overfit on a small sample of data.

So, the loss looks kind of messy, here are the github results

When visualising where the model fires (some more github results here)

  • First row - Uncertain,
  • 2nd Most incorrect AU12 Lips,
  • 3rd Most incorrect AU6 cheek,
  • 4th Most correct AU12 Lips,
  • 5th Most correct AU6 cheek

Since I begin writing this post (3h ago) and I start also rewriting the code, creating a git, etc - so 3 hours later I already have different results, and a better understanding of my model :sweat_smile:

Still I think that the model is a bit confused by AU6 the cheek descriptor.

I have several ideas to try - still on small sample of data:

  • train it with more different AUs as for example AU4 brow lower and AU25 mouth open
  • to build a two classes model with emotions this time, for example happy (AU6+AU12+AU25) and not happy
  • to build an all classes (11 classes for AUs and 7 classes for emotions) - but to take into account the “neutral_face” as we took into account the “background” class for object detection - to remove it in the end
  • to build a model with 2 different outputs as in the object detection - only now one output for AU (with 11 outputs) and one for Emotions (with 7 outputs)

After that I will choose the best of those, will add more data to it, and improve it.

If anyone wants to jump in, please do it :star_struck:.
Also if you have any insights or any informations to read about I would be very grateful to hear from you :slightly_smiling_face:


This looks super interesting. A couple of points I would like to throw in. For the imbalanced dataset, you can create a sampler https://pytorch.org/docs/stable/_modules/torch/utils/data/sampler.html. Having a balanced dataset is quite important.

As for the training you seem to have used a learning rate of 1e-4. I would recommend trying with a higher learning rate say 1e-2. Lastly, you have used cam and for that, the last convlayer has 512 input channels and 3 output channels. From my experience, this is a bit of a bad idea and having a linear layer at the end does help. You can still get the CAM activations with the linear layer. I had created a notebook on that explains that here https://github.com/TheShadow29/FAI-notes/blob/master/notebooks/Using-CAM-for-CNN-Visualization.ipynb. So it would go something like conv(512, 256) -> AdaptiveConcat ->Linear(256, num_classes) -> LogSoftmax().

1 Like

Thank you for your reply.

Regarding the sampler, there is no way to specify it to the DataLoader unless I modify the dataset.py method class ImageData - which calls the DataLoader - by adding an extra parameter - sampler which is by default None. There are 3 ways of doing it - either creating a new class which inherits the old one, and modify it there - either modify the classes and do a pull request @jeremy or to use directly torch.utils.data.DataLoader

def get_dl(self, ds, shuffle):
        if ds is None: return None
        return DataLoader(ds, batch_size=self.bs, shuffle=shuffle,
            num_workers=self.num_workers, pin_memory=False)

Thank you for sharing the repository about CAM - I will have a deeper look, cause now I am a bit confused.

Last night I let the model run with all the AUs (11AUs in 12.358 files) - even if they were unbalanced - and I have some new results - which are very excited at the first look, but then when you actually see that one of the labels takes 75% of the data, while other labels take less than 1% of the data - it’s kind of normal to have such a good accuracy.


Regarding the sampler, the easiest way (though a bit hacky) would be to redefine the dataloader in the created object.

tmp_ds = data.trn_dl.dataset
data.trn_dl = DataLoader(tmp_ds, batch_size, num_workers, shuffle, sampler=somesamplerfunction)

Though I personally feel making a new class which inherits the older class is the most elegant way.

The accuracy metric isn’t really interesting in the case of imbalanced. One way would be to create a new csv file with only subset of the data and having a more or less balanced class. You could also try to see the precision and the recall scores. The functions from scikit-learn allow to do it for the case of more than 2 classes. What it essentially does is keeps one label as true positive and all other labels as true negative. This would give a much better idea of how well the model is learning. You could also have a look at the confusion matrix (which would be 11x11) to see which classes are getting most confused to interpret your results.


1 Like

Hey :slight_smile:

Thank you for your reply. It makes my life easier. Today I spend way too much time trying to figure out how the sampler are working, and how can I make them work the way that I wanted. And still I am not convinced about their accuracy - I tried them on MNIST, or simple data, simple tensors with only three classes - and the WeightedRandomSampler seems to always ignore certain classes no matter how the weights look like.

So definetely, the easiest and safest way is to create another csv and to work on it.
I will do the other score as well and keep it posted.

Thanks again :smiley:

1 Like

Yeah making a new csv is definitely the easier thing and perhaps should be the first step.

For the sampler thing, I got it working and I have uploaded the notebook here: https://github.com/TheShadow29/FAI-notes/blob/master/notebooks/Using-Sampler-For-Class-Imbalance.ipynb. I haven’t explained much and I will probably get to it sometime later, though most of it is kind of self-explanatory.

The main thing to note is that the weights in the WeightedRandomSampler correspond to the weights of each of the indices and not the classes.

You could get balanced dataset in a much easier way using StratifiedSampler found here:https://github.com/ncullen93/torchsample/blob/master/torchsample/samplers.py#L22. I found this following the thread here: https://discuss.pytorch.org/t/how-to-enable-the-dataloader-to-sample-from-each-class-with-equal-probability/911 which would be exactly what you need (I guess?).

Additionally, you could also give more penalty to some of the classes as well which are under-represented. For that, you would need to pass the weights in the critic function of the leaner object and it would look something like learn.crit = nn.NLLLoss(weights=weights).

1 Like

Thank you Arka for this information. After managing the samplers and understanding them :slight_smile: finally I created a script for generating cvs with any linear combinations with any dimension of the AUs. For example just AU6 and AU12 - 2000 each, or 200 each, and so on.

Currently I work on a basic task of happy/non-happy based on action units so I created a cvs with two classes if the pictures were annotated with (Au6, Au12) or if they were missing.

The results are quite nice - just the model doesn’t generalize very well I guess - because when I test it with pictures from the lab it is a mess.

epoch      trn_loss   val_loss   accuracy                   
    x      0.044441   0.107906   0.9625    

Next step is to try to make it generalize better since I reached it already the overfitting on the training data, by the following steps

1 Like