I was spending some time trying to figure out why these two loops take the same 0.01ms:
var big = Tensor<Float>(randomNormal: [50, 50])
time(repeating: 10) { big = big • big }
var evenBigger = Tensor<Float>(randomNormal: [10000, 10000])
time(repeating: 10) { evenBigger = evenBigger • evenBigger }
and I realize that this is probably because there is no GPU sync, so the compute is probably just all async. How do I measure this properly?
Side question: most of the code I’ve seen uses scalarized(). I didn’t know about scalar - but that’s much nicer! Is they’re any reason we’re not using that? Does it do something different?
It looks like .scalar returns an optional (because the tensor may not be zero’d) and scalarized() aborts if the input has more than one scalar, so it doesn’t return an optional. That is a crazy subtle distinction for such similar names.