HRNet: new SOTA architecture

I just started to experiment with HRNet and my initial results look quite promising. Compared to ResNet, I have to use smaller learning rates, but it still converges as fast as ResNet.
The core idea behind HRNet is to not only use a single resolution as ResNet, but to keep multiple resolutions, compute with them in parallel and share the information (fusion) across the different resolutions.
This architecture seems to be powerful enough that it can be used for classification and image segmentation while the only difference can be found in the last convolutions/computations.


Explanation by one of the authors (starts at around 8:00):

Classification, Object Detection, Semantic Segmentation, …

Pose Estimation


This is awesome. I was actually looking for the code for this today. I am currently training with HRNet and ResNet on the ade20k dataset for semantic segmentation. Thanks for posting the links, I will be reading more about this.

Hey did you manage to train it? I am super interested in your results and have been training on ADE myself.

The experiments are currently on hold. I am refactoring my code base to make comparisons easier. It is something I have been pushing back for way too long. Once I have results, I am definitely going to post them here.

Do you have any code you can share? Not refactored is ok, I will do so and share the notebook back with you

I only have TensorFlow code which hasn’t been tested thoroughly enough that I would feel comfortable yet to share it. You are definitely better off at this point with the original code.

How did you solved the problem of model output being 1/4 the size of the original image?