Lesson 8 a Swift Implementation

I got very similar different timings for S4TF. Did you exclude the first iteration of the S4TF matmul? It includes the compilation step (see this thread).

The timing I got on a 6 core intel processor (CPU) were as follows:

  • Numpy - 5.63 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • PyTorch - 2.21 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • S4TF - 1000 loops Mean: 3.878070619 ms, Std Dev: 223.7475553704346 µs

GPU (1080 Ti):

  • S4TF - 1000 loops Mean: 32.084067 µs. Std Dev: 10.078865044860505 µs
  • PyTorch - 202 µs ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Links: