Trouble understanding related work for the ECCV'18 paper :

The idea behind the paper, Concept Mask : Large Scale Segmentation for Semantic Concepts is to take an image and keyword as input and extract a segementation mask for that concept. The authors claim semi-supervised learning led to scale to a large number of concepts.

But I am more curious to know and read literature on their mask extraction technique, which is a 2 stage process. In the first Stage, a low resolution attention map (heatmap) for the concept is extracted. This stage is followed by a refinement network (segmentation network) which takes input as the original image and the heatmap extracted from Stage1 to give a segmentation mask.

I did not find the related work (Decoupled Deep Neural Network for
Semi-supervised Semantic Segmentation
) particularly helpful to understand the heatmap based approach, so wanted to reach out to the community before doing a brute force search on the bibtex. Am I missing any popular literature whose techniques have inspired this line of work?