@Jeremy if you think this or part of it belongs in harebrain, feel free to move it over.
Big thanks to @sgugger for finding the Path and Just Swift libraries. I definitely looked over your shoulder on github to get a little help figuring those out.
A few things that did not go as I expected:
Broadcasting in s4tf with Tensors does not play super nice with operators like >, <, >=, <=, ==, !=. var a = Tensor([10.0, 6.0, -4.0]) print(a > 0) returns false
Note: np/pytorch return [1, 1, 0] Edit: Thanks @jekbradbury for pointing out that S4TF does have this ability. I simply needed to use the .>, .<=, etc. operators prefixed with the period.
Either I could not find it or S4TF does not have a sum by axis (nor a min/max with an axis parameter) like np.sum(a, axis=0). If I overlooked it, please share some knowledge!
To get around this I converted the TensorFlow Tensors to np and back. Edit: Thanks @sgugger and @dan-zheng for pointing out that there are squeezingAxes and alongAxes arguments to the S4TF Tensor.sum method that I overlooked. Looks like it is documented here under the “Extensions” heading.
I could not find the implementation of anything documented on S4TF’s website in the tensorflow/swift github repo. For example: the Tensor Struct, the max function, etc. Am I just overlooking this somewhere in the git repo? Edit: it appears I did not look hard enough. Though it is not in the tf/swift repo, it is in the apple/swift/tree/tf repo.
It took a little extra effort to get the %%time and %%timeit magic commands to work. See this post.
I plan to continue down this path for the rest of lesson 8 and hopefully all future lessons. All feedback is greatly appreciated as I am new to Swift and looking to learn!
If you scroll down to the bottom you can see that S4TF’s matrix multiply is faster than the PyTorch one in Jeremy’s notebook. However that is not really a fair comparison (only comparing a single operation, not using the same hardware, etc).
I’ll try to do a more interesting comparison when I do the next notebook.
Thanks for putting this together and sharing it, Jeff.
Minor issue: in your first %%timeit cell with slowMatMul, the mean is greater than the max. After diving into the code, I think the issue is that you run the loop n_times+1 in the timeitMagic function. See my PR below.
Thanks @neuradai! I just merged your changes. I had some code that was intentionally adding an extra loop and had moved it to the swift_kernel. Looks like I missed a pretty important piece of it!
Edit: changed to perform matmul on tensors with shape (1000,1000) @ (1000,1000). (Thanks Kaspar Lund for suggestion).
Swift-TF: 8.4260 ms
Numpy: 17.3245 ms
PyTorch: 7.1645 ms
Tensorflow: 7.2635 ms
Note: all results run on CPU so not comprehensive benchmark.
I would also add that swift-TF feels much slower than pytorch; perhaps some overhead?
I couldn’t get swiftc -O to work but building via swift build doesn’t seem to improve much.
I got very similardifferent timings for S4TF. Did you exclude the first iteration of the S4TF matmul? It includes the compilation step (see this thread).
The timing I got on a 6 core intel processor (CPU) were as follows:
Numpy - 5.63 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
PyTorch - 2.21 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)