"Differentiable programming" - is this why we switched to pytorch?

rob · January 6, 2018, 7:08pm

Is the capability to do differentiable programming the major difference between pytorch and other frameworks? Just trying to understand what this is after Yann’s comment here: https://www.facebook.com/yann.lecun/posts/10155003011462143

Does this allow for online learning/continual update of models in ways that are ineffective with traditional neural networks?

cedric · January 6, 2018, 7:46pm

Is it just me that the link to Yann’s comment is broken, or the page may have been removed?

radek · January 6, 2018, 7:53pm

I think we switched to PyTorch because it doesn’t hide much from you and certain more complex things are easier to do in PyTorch than in other frameworks (or there might have even been something that was not possible to do in keras at the time of the switch).

The big difference between PyTorch and tensorflow is (was?) that you can create your computation graphs dynamically, which is quite nice for instance when you have training examples of varying length.

The real benefit of this though is that working with PyTorch resembles doing things in Numpy or directly in Python - you write down what computations you want to perform and you get this magical backward method that calculates gradients and that is all.

rob · January 6, 2018, 8:12pm

Fixed, thanks! I had removed the m. prefix from a mobile link. Looks like fb doesn’t like that

cedric · January 6, 2018, 8:33pm

I see. It’s working now. Thanks.

alexandrecc · January 8, 2018, 3:19am

I could be wrong but differentiable programming is a potential replacement terminology to mostly replace the deep learning term. I don’t think it is directly associated with a static or dynamic graph.

Differentiable programming probably means that all functional nodes in a graph (static or dynamic) represent functions whose derivative exists at each point in its domain (https://en.wikipedia.org/wiki/Differentiable_function). This is a necessary condition to calculate the partial derivative of each node in a neural network (deep or shallow) to apply SGD or any gradient based optimizer. IMO, almost any so called deep learning API could be a differentiable programming API.

Of course about the question, I guess you know about Jeremy’s (almost historical!) blog explaining the reasons for the change to pytorch : http://www.fast.ai/2017/09/08/introducing-pytorch-for-fastai/

Dynamic graph, and API semantic are probably the major reasons for the change that I extract from Jeremy’s blog.

By the way, fresh news from last week, Tensorflow new release candidate 1.5 is now supporting eager mode that is basically an experimental interface for dynamic graph mode : https://github.com/tensorflow/tensorflow/releases/tag/v1.5.0-rc0
It will be interesting to see how/if keras will allow eager mode in a not so distant future.

rob · January 8, 2018, 3:52am

Thanks Alexandre, much of that is news to me. Always playing catch up in this field!

machinethink · January 8, 2018, 12:40pm

LeCun says the following:

How would you do this with a static graph?

alexandrecc · January 8, 2018, 1:09pm

Not possible with a static graph of course. A differentiable dynamic graph is one type of instance of differentiable programming. But I don’t think that it is a necessary condition; but currently it is a popular instance especially for research. I just wanted to outline that by terminology differentiable programming means using functions that are differentiable and that can be learned by gradient descent from data.

You can do object oriented programming with static arrays or dynamic lists.

machinethink · January 8, 2018, 1:37pm

…combined with traditional programming techniques such as loops and branches.

For example, Caffe lets you do deep learning but I wouldn’t call that differentiable programming. Even though it uses differentiable functions, you cannot use these as building blocks in a program, since Caffe does not actually have a language that allows this (it only has a dataflow language that does not include loops and branches).

We’re probably saying the same thing. I’m just thinking out loud.

alexandrecc · January 8, 2018, 2:18pm

I agree, we converge on the same idea ! And that is an interesting reflexion/discussion. Dynamic graph definitely helps and is more flexible to apply the differentiable programming concept.

What if you create small network pieces (functional blocks) with different prototxt files in Caffe and you call these pieces with loops and branches, and take care of inputs/outputs between pieces in Python ?

I remember this blog from Francois Chollet also brainstorming about this future concept : https://blog.keras.io/the-future-of-deep-learning.html

machinethink · January 8, 2018, 2:36pm

But what I’m talking about is taking this one step further, and use Python to generate the prototxt files. Each iteration of the loop could generate a slightly different (or even completely different) prototxt file.

That is what I see as the unique thing about differentiable programming – not that you can use fixed building blocks inside loops and branches but that you can change these building blocks on-the-fly too.