Hi Jarek. Pytorch’s LBFGS optimizer is the only second order optimizer in active use that I know about. I think that searching for LBFGS and following article references would lead to most publications about second order methods. But I have never seen much research activity in this direction.
Your paper proposes an interesting method, or class of methods. (I only skimmed it.) My reservation is that the paper is theory, without experiments measuring the method’s performance relative to established optimizers.
I also was interested in these questions and even implemented a per coordinate Newton’s method as a PyTorch optimizer. I had great hopes, but it did not work well in practice. One informative failure was a toy problem whose loss made a barely sloping parabolic valley along a diagonal relative to the weight coordinates. The optimizer bounced between sides of the valley making negligible progress. Ordinary Adam and SGD worked much better. The optimizer was also strongly captured by local minima. Maybe some kind of gradient smoothing as suggested in the paper would solve this.
Irrespective of my failure with a simplistic 2nd order method, I encourage you to implement the paper’s methods and give them a test on standard benchmarks. Quantitative experimentation seems to be how new ideas are adopted and the field progresses.
Good luck! I have an ongoing interest in second order methods and would like to know what you discover.