Disclaimer, I’m quite new to Python/ML, be kind if I’m suggesting something dumb ![]()
In the code for RNNRegularizer, the TAR is taken as the difference along dimension 1 with h[:,1:] - h[:,:-1].
After wrapping my head around what this actually did, I was curious to see if it was any different to using the .diff method like h.diff(dim=1).
On a CPU, no difference, but on the GPU it is slightly faster.
While I was at it, I also tried comparing .pow(2) with .square(), wondering if there was some hardware-level acceleration done for the latter. Tests show that it is quite a bit faster.
The above compares these three snippets, 100 runs each, on a GPU:
res1 = (h[:, 1:] - h[:, :-1]).float().pow(2).mean()
res2 = h.diff(dim=1).float().pow(2).mean()
res3 = h.diff(dim=1).float().square().mean()
IMO using .diff() is easier on the eye/brain than the indexing method, with mild performance benefit to boot.
And it seems like a global search/replace for .pow(2) → .square() might eek out some more ‘fast’ (and reads nicer, says me).
But perhaps these operations are such a tiny part of any real training process that it’s not worth the effort. Or maybe this is just on my machine and not universal?
