Disclaimer, I’m quite new to Python/ML, be kind if I’m suggesting something dumb
In the code for
RNNRegularizer, the TAR is taken as the difference along dimension 1 with
h[:,1:] - h[:,:-1].
After wrapping my head around what this actually did, I was curious to see if it was any different to using the
.diff method like
On a CPU, no difference, but on the GPU it is slightly faster.
While I was at it, I also tried comparing
.square(), wondering if there was some hardware-level acceleration done for the latter. Tests show that it is quite a bit faster.
The above compares these three snippets, 100 runs each, on a GPU:
res1 = (h[:, 1:] - h[:, :-1]).float().pow(2).mean() res2 = h.diff(dim=1).float().pow(2).mean() res3 = h.diff(dim=1).float().square().mean()
.diff() is easier on the eye/brain than the indexing method, with mild performance benefit to boot.
And it seems like a global search/replace for
.square() might eek out some more ‘fast’ (and reads nicer, says me).
But perhaps these operations are such a tiny part of any real training process that it’s not worth the effort. Or maybe this is just on my machine and not universal?