The idea behind the paper, Concept Mask : Large Scale Segmentation for Semantic Concepts is to take an image and keyword as input and extract a segementation mask for that concept. The authors claim semi-supervised learning led to scale to a large number of concepts.
But I am more curious to know and read literature on their mask extraction technique, which is a 2 stage process. In the first Stage, a low resolution attention map (heatmap) for the concept is extracted. This stage is followed by a refinement network (segmentation network) which takes input as the original image and the heatmap extracted from Stage1 to give a segmentation mask.
I did not find the related work (Decoupled Deep Neural Network for
Semi-supervised Semantic Segmentation) particularly helpful to understand the heatmap based approach, so wanted to reach out to the community before doing a brute force search on the bibtex. Am I missing any popular literature whose techniques have inspired this line of work?