Lesson 8 a Swift Implementation

metachi · March 27, 2019, 1:26am

I have made an attempt to implement lesson 8 in Swift using S4TF:
lesson8_01.ipynb
lesson8_02.ipynb

@Jeremy if you think this or part of it belongs in harebrain, feel free to move it over.

Big thanks to @sgugger for finding the Path and Just Swift libraries. I definitely looked over your shoulder on github to get a little help figuring those out.

A few things that did not go as I expected:

Broadcasting in s4tf with Tensors does not play super nice with operators like >, <, >=, <=, ==, !=.
var a = Tensor([10.0, 6.0, -4.0])
print(a > 0) returns false
Note: np/pytorch return [1, 1, 0]
Edit: Thanks @jekbradbury for pointing out that S4TF does have this ability. I simply needed to use the .>, .<=, etc. operators prefixed with the period.
Either I could not find it or S4TF does not have a sum by axis (nor a min/max with an axis parameter) like np.sum(a, axis=0). If I overlooked it, please share some knowledge!
To get around this I converted the TensorFlow Tensors to np and back.
Edit: Thanks @sgugger and @dan-zheng for pointing out that there are squeezingAxes and alongAxes arguments to the S4TF Tensor.sum method that I overlooked. Looks like it is documented here under the “Extensions” heading.
I could not find the implementation of anything documented on S4TF’s website in the tensorflow/swift github repo. For example: the Tensor Struct, the max function, etc. Am I just overlooking this somewhere in the git repo?
Edit: it appears I did not look hard enough. Though it is not in the tf/swift repo, it is in the apple/swift/tree/tf repo.
It took a little extra effort to get the %%time and %%timeit magic commands to work. See this post.

I plan to continue down this path for the rest of lesson 8 and hopefully all future lessons. All feedback is greatly appreciated as I am new to Swift and looking to learn!

marii · March 27, 2019, 5:33am

Thanks for this! I have wanted to start working on swift for tensorflow but was running a bit behind.

init_27 · March 27, 2019, 7:22am

Hi @metachi
Thank you for putting it together.

Did you get a chance to compare the speed of this implementation Vs Python and PyTorch ones?

I believe that’d be very interesting.

Benudek · March 27, 2019, 9:40am

very cool idea!

metachi · March 27, 2019, 11:44am

@init_27, That’s a great idea!

If you scroll down to the bottom you can see that S4TF’s matrix multiply is faster than the PyTorch one in Jeremy’s notebook. However that is not really a fair comparison (only comparing a single operation, not using the same hardware, etc).

I’ll try to do a more interesting comparison when I do the next notebook.

jeremy · March 27, 2019, 1:44pm

I’ll leave it here because not everyone on #harebrain has access to this course.

jekbradbury · March 27, 2019, 3:20pm

There are separate elementwise comparison operators, prefixed with a period: .>, .<=, etc.

metachi · March 28, 2019, 12:15am

@jekbradbury nice! Thank you for pointing that out!

neuradai · March 28, 2019, 2:35am

Thanks for putting this together and sharing it, Jeff.

Minor issue: in your first %%timeit cell with slowMatMul, the mean is greater than the max. After diving into the code, I think the issue is that you run the loop n_times+1 in the timeitMagic function. See my PR below.

metachi · March 28, 2019, 10:17am

Thanks @neuradai! I just merged your changes. I had some code that was intentionally adding an extra loop and had moved it to the swift_kernel. Looks like I missed a pretty important piece of it!

sgugger · March 28, 2019, 4:58pm

I finished redoing 01 in swift too and there is something for the sum along an axis: what I could do is

a.sum(alongAxes: 1).squeezingShape(at: 1)

to get the same thing as a.sum(dim=1) in PyTorch.

jeremy · March 28, 2019, 5:54pm

This is the version that @sgugger put together btw:

github.com

fastai/fastai_docs/blob/master/dev_swift/01_matmul.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Installing packages:\n",
      "\t.package(path: \"/home/ubuntu/notebooks/swift/FastaiNotebooks\")\n",
      "\t\tFastaiNotebooks\n",
      "Working in: /tmp/tmp5_3pv2dc\n",
      "Fetching https://github.com/mxcl/Path.swift\n",
      "Fetching https://github.com/JustHTTP/Just\n",
      "Completed resolution in 1.17s\n",
      "Cloning https://github.com/JustHTTP/Just\n",
      "Resolving https://github.com/JustHTTP/Just at 0.7.1\n",

This file has been truncated. show original

metachi · March 28, 2019, 11:31pm

Thanks @sgugger! That is exactly what I was looking for! Looks like it was in the documentation under the Extensions heading

dan-zheng · April 1, 2019, 9:28pm

sgugger:

I finished redoing 01 in swift too and there is something for the sum along an axis: what I could do is
a.sum(alongAxes: 1).squeezingShape(at: 1)
to get the same thing as a.sum(dim=1) in PyTorch.

You can try a.sum(squeezingAxes: 1) instead!

Reduction ops have alongAxes: and squeezingAxes: variants, where alongAxes: keeps the axis-of-reduction and squeezingAxes: drops it.

sgugger · April 1, 2019, 10:15pm

Yeah I’ve seen that option since then

twairball · April 3, 2019, 3:12pm

Edit: changed to perform matmul on tensors with shape (1000,1000) @ (1000,1000). (Thanks Kaspar Lund for suggestion).

Swift-TF: 8.4260 ms
Numpy: 17.3245 ms
PyTorch: 7.1645 ms
Tensorflow: 7.2635 ms

Note: all results run on CPU so not comprehensive benchmark.

I would also add that swift-TF feels much slower than pytorch; perhaps some overhead?
I couldn’t get swiftc -O to work but building via swift build doesn’t seem to improve much.

Repo:

Kaspar · April 3, 2019, 5:32pm

nice work. I think is would be more interesting and fair with shapes of 1000 x 1000 . I would expect that numpy would be much faster in that scenario

metachi · April 6, 2019, 4:53pm

I got very similar ~~different~~ timings for S4TF. ~~Did you exclude the first iteration of the S4TF matmul? It includes the compilation step (see this thread).~~

The timing I got on a 6 core intel processor (CPU) were as follows:

Numpy - 5.63 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
PyTorch - 2.21 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
S4TF - 1000 loops Mean: 3.878070619 ms, Std Dev: 223.7475553704346 µs

GPU (1080 Ti):

S4TF - 1000 loops Mean: 32.084067 µs. Std Dev: 10.078865044860505 µs
PyTorch - 202 µs ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Links:

metachi · April 9, 2019, 1:59am

I updated the first post with a link to a S4TF version of the lesson8 02 “fully connected” notebook.