Hello there !
Following Jeremy Howard post on Twitter, I went and tried to implement the paper https://arxiv.org/pdf/1812.06162.pdf
It did not take too much time to realize that the first time I read the paper I barely looked at the result, and it was not the easiest paper to understand, as they do not give much details about the math behind, neither do they explain clearly their implementation.
I ran into a problem when trying to implement this : it seems that they assume that we have a multi GPU used to compute the statistics we are looking for.
I have tried three different ways to solve this problem but none seems to really work.
I have written a little Colab Notebook to explain better my work https://colab.research.google.com/drive/15lTG_r03yqSwShZ0JO4XaoWixLMXMEmv, and if anyone is interested I would be glad to try to solve this problem together