Lesson 8 a Swift Implementation

I have made an attempt to implement lesson 8 in Swift using S4TF:

@Jeremy if you think this or part of it belongs in harebrain, feel free to move it over.

Big thanks to @sgugger for finding the Path and Just Swift libraries. I definitely looked over your shoulder on github to get a little help figuring those out.

A few things that did not go as I expected:

  • Broadcasting in s4tf with Tensors does not play super nice with operators like >, <, >=, <=, ==, !=.
    var a = Tensor([10.0, 6.0, -4.0])
    print(a > 0) returns false
    Note: np/pytorch return [1, 1, 0]

    Edit: Thanks @jekbradbury for pointing out that S4TF does have this ability. I simply needed to use the .>, .<=, etc. operators prefixed with the period.

  • Either I could not find it or S4TF does not have a sum by axis (nor a min/max with an axis parameter) like np.sum(a, axis=0). If I overlooked it, please share some knowledge!
    To get around this I converted the TensorFlow Tensors to np and back.

    Edit: Thanks @sgugger and @dan-zheng for pointing out that there are squeezingAxes and alongAxes arguments to the S4TF Tensor.sum method that I overlooked. Looks like it is documented here under the “Extensions” heading.

  • I could not find the implementation of anything documented on S4TF’s website in the tensorflow/swift github repo. For example: the Tensor Struct, the max function, etc. Am I just overlooking this somewhere in the git repo?
    Edit: it appears I did not look hard enough. Though it is not in the tf/swift repo, it is in the apple/swift/tree/tf repo.

  • It took a little extra effort to get the %%time and %%timeit magic commands to work. See this post.

I plan to continue down this path for the rest of lesson 8 and hopefully all future lessons. All feedback is greatly appreciated as I am new to Swift and looking to learn!


Thanks for this! I have wanted to start working on swift for tensorflow but was running a bit behind.

Hi @metachi
Thank you for putting it together.

Did you get a chance to compare the speed of this implementation Vs Python and PyTorch ones?

I believe that’d be very interesting.


very cool idea!

1 Like

@init_27, That’s a great idea!

If you scroll down to the bottom you can see that S4TF’s matrix multiply is faster than the PyTorch one in Jeremy’s notebook. However that is not really a fair comparison (only comparing a single operation, not using the same hardware, etc).

I’ll try to do a more interesting comparison when I do the next notebook.


I’ll leave it here because not everyone on #harebrain has access to this course.

1 Like

There are separate elementwise comparison operators, prefixed with a period: .>, .<=, etc.


@jekbradbury nice! Thank you for pointing that out!

Thanks for putting this together and sharing it, Jeff.

Minor issue: in your first %%timeit cell with slowMatMul, the mean is greater than the max. After diving into the code, I think the issue is that you run the loop n_times+1 in the timeitMagic function. See my PR below.

1 Like

Thanks @neuradai! I just merged your changes. I had some code that was intentionally adding an extra loop and had moved it to the swift_kernel. Looks like I missed a pretty important piece of it! :slight_smile:

1 Like

I finished redoing 01 in swift too and there is something for the sum along an axis: what I could do is

a.sum(alongAxes: 1).squeezingShape(at: 1)

to get the same thing as a.sum(dim=1) in PyTorch.


This is the version that @sgugger put together btw:


Thanks @sgugger! That is exactly what I was looking for! Looks like it was in the documentation under the Extensions heading

You can try a.sum(squeezingAxes: 1) instead!

Reduction ops have alongAxes: and squeezingAxes: variants, where alongAxes: keeps the axis-of-reduction and squeezingAxes: drops it.


Yeah I’ve seen that option since then :wink:

1 Like

Edit: changed to perform matmul on tensors with shape (1000,1000) @ (1000,1000). (Thanks Kaspar Lund for suggestion).

  • Swift-TF: 8.4260 ms
  • Numpy: 17.3245 ms
  • PyTorch: 7.1645 ms
  • Tensorflow: 7.2635 ms

Note: all results run on CPU so not comprehensive benchmark.

I would also add that swift-TF feels much slower than pytorch; perhaps some overhead?
I couldn’t get swiftc -O to work but building via swift build doesn’t seem to improve much.


nice work. I think is would be more interesting and fair with shapes of 1000 x 1000 . I would expect that numpy would be much faster in that scenario

I got very similar different timings for S4TF. Did you exclude the first iteration of the S4TF matmul? It includes the compilation step (see this thread).

The timing I got on a 6 core intel processor (CPU) were as follows:

  • Numpy - 5.63 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • PyTorch - 2.21 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • S4TF - 1000 loops Mean: 3.878070619 ms, Std Dev: 223.7475553704346 µs

GPU (1080 Ti):

  • S4TF - 1000 loops Mean: 32.084067 µs. Std Dev: 10.078865044860505 µs
  • PyTorch - 202 µs ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


I updated the first post with a link to a S4TF version of the lesson8 02 “fully connected” notebook.