Properly timing GPU work?

clattner · April 20, 2019, 10:43pm

I was spending some time trying to figure out why these two loops take the same 0.01ms:

var big = Tensor<Float>(randomNormal: [50, 50])
time(repeating: 10) { big = big • big }

var evenBigger = Tensor<Float>(randomNormal: [10000, 10000])
time(repeating: 10) { evenBigger = evenBigger • evenBigger }

and I realize that this is probably because there is no GPU sync, so the compute is probably just all async. How do I measure this properly?

clattner · April 20, 2019, 10:52pm

Ah, I think I figured it out. Copying a scalar from the tensor back to the host forces a GPU sync:

    // Copy a scalar back to the host to force a GPU sync.
    _ = tmp[0, 0].scalar

Lemme know if there is some better way to do this

jeremy · April 21, 2019, 1:38pm

Side question: most of the code I’ve seen uses scalarized(). I didn’t know about scalar - but that’s much nicer! Is they’re any reason we’re not using that? Does it do something different?

clattner · April 21, 2019, 5:01pm

Huh I didn’t know about scalarized().

It looks like .scalar returns an optional (because the tensor may not be zero’d) and scalarized() aborts if the input has more than one scalar, so it doesn’t return an optional. That is a crazy subtle distinction for such similar names.

I’ll filed this bug TF-454 - t.scalar vs t.scalarized() is super confusing to track sorting this out.

jeremy · April 21, 2019, 5:13pm

I’d rather it was called scalar and behaved like scalarized() if that’s an option.