Understanding Technical Notation and Facial Landmarks / Heatmap

Hey all, I just finished going through fast ai’s first course pretty rigorously and hope to implement the following paper (https://arxiv.org/abs/1908.05932) as my capstone project. I’ve already stumbled across a few notational roadblocks, however, and would love some help understanding the following the architecture’s loss functions. Attached below are the pictures highlighting the specific issues I’m facing, though at least part of the paper is probably necessary for context.

Also, implementing this paper requires a heatmap containing 70 facial landmarks extracted from an image’s face. I’m assuming there’s a pre-trained model that does this, or at least somewhat does this – can someone point me in that direction?

Would appreciate any level of help, and am happy to chat all things DL! Thanks in advance.


What do the x and y in this refer to? If the image isn’t loading, it’s pointing to function (1) in the paper. Do the double lines around || Fi(x) - Fi(y) || mean the frobenius norm or something else?

Screen Shot 2020-02-14 at 5.47.44 PM
Similar question as above with respect to the double lines and meaning of x and y

Screen Shot 2020-02-14 at 5.47.52 PM
Are the E’s here referring to expected value or some other quantity?

Thanks again!

look at "Entrywise" matrix norms here .
Frobenius norm is a L2 norm ie there woudl be a 2 instead of 1 so your are dealing with L1 norm.

i think x and y are your input image and target image respectively.
The loss mentioned in the paper is - perceptual loss which Jeremy talks about in the Gan superresolution (IIRC)
If they are talking about Perceptual Loss i think it should be frobenius norm(L2) and not L1.

Screen Shot 2020-02-14 at 3.25.17 AM

F - seems to stand for feature
Hope that helps :slight_smile:

The \mathbb{E} is indeed the expected value, but it’s just a fancy way of saying that they’re taking the mean.

Thanks! This helps a ton :slight_smile: I think I’ve come a lot closer to decoding a lot of their syntax for loss functions here. To be honest it seems like an amalgamation of a few other papers wrapped under a shiny new bow.

Sweet, sounds like a typical research paper.

I think I’m closer to having the technical notation down, but does anyone have a reference an API/model that accepts an image (or list of images) and outputs its corresponding heatmap and facial landmarks? I’ve found a number of papers that explore the issue but have yet to get one to work.

Not exactly what you’re asking for but a few weeks ago I made this: https://github.com/hollance/BlazeFace-PyTorch

It’s a conversion of a TFLite face detection model to PyTorch. The model does not output heatmaps (although they’re obviously in there somewhere) but it does output landmarks. If you want to see the heatmaps, you’ll need to add some code to also output them.