I’m wondering, is there a better way to deal with it without creating a tensor of zeros every time and putting it in a variable ? I’m guessing it’s expensive to construct a Variable each time when we calculating loss and putting it on gpu from cpu.
PyTorch/Python anyway will broadcast the 0. It will be interesting to see how much memory the internal broadcasting takes vs. the memory taken by explicitly providing the zeros. Former should be very cheap I believe.
I suspect that either way this probably will be very cheap (though the broadcasting trick from @anandsaha is definitely a neat idea )
The good news is that with yesterday’s release of pytorch 0.3 we now get a profiler so maybe there would be value in pointing it at this But I suspect that for most of the use cases giving this much thought is probably not essential - my guess is that this would only matter if your model was impractically small or something like that.
If you do go the profiler route, please share your findings with us - would be interested to see the difference that the broadcasting trick makes and also what percentage of time this takes up relative to everything else your model is doing.