I just started to experiment with HRNet and my initial results look quite promising. Compared to ResNet, I have to use smaller learning rates, but it still converges as fast as ResNet.
The core idea behind HRNet is to not only use a single resolution as ResNet, but to keep multiple resolutions, compute with them in parallel and share the information (fusion) across the different resolutions.
This architecture seems to be powerful enough that it can be used for classification and image segmentation while the only difference can be found in the last convolutions/computations.
Explanation by one of the authors (starts at around 8:00):
This is awesome. I was actually looking for the code for this today. I am currently training with HRNet and ResNet on the ade20k dataset for semantic segmentation. Thanks for posting the links, I will be reading more about this.
The experiments are currently on hold. I am refactoring my code base to make comparisons easier. It is something I have been pushing back for way too long. Once I have results, I am definitely going to post them here.