How to declare a differentiable function in an argument?

sgugger · April 2, 2019, 7:56pm

I’m trying to write a basic training loop that would take a Dataset, a model, an Optimizer and a loss function. After a bit of search, I’ve managed to get to declare the first three, but how to declare in S4TF that the last one is differentiable eludes me. My progress so far is:

func basic_training_loop<Model, Opt:Optimizer> (train_ds:Dataset<Batch>, model: inout Model, opt: inout Opt, 
                         loss_func: (Tensor<Float>, Tensor<Float>)->Tensor<Float>)
                         where Opt.Model == Model, Opt.Scalar == Float,
                               Model.Input == Tensor<Float>,
                               Model.Output == Tensor<Float>
{
    for batch in train_ds{
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
            let preds = model.applied(to: batch.x, in: trainingContext)
            return loss_func(preds, batch.y)
        }
        print(loss)
        opt.update(&model.allDifferentiableVariables, along: grads)
    }
}

and it complains that loss_func isn’t differentiable (with reasons). I’ve tried adding an @differentiable in the declaration but that doesn’t help either.

rxwei · April 3, 2019, 12:10am

The type of the loss function should be defined as

@differentiable (Tensor<Float>, Tensor<Float>) -> Tensor<Float>

rxwei · April 3, 2019, 12:15am

The use of &model.allDifferentiableVariables, while model is generic, is highly problematic because of an underlying semantic issue. The allDifferentiableVariables property defined in the Differentiable protocol is not supposed to have a setter and the setter will be removed soon. Here’s the issue: TF-208.

For now, I’d recommend against defining training loops with inout Model. We are trying to figure out the best way to structure the peripherals for good training loop functions to be possible.

sgugger · April 3, 2019, 12:18am

I tried that and it didn’t work.

I’d gladly do so but how do you update parameters of a model when it’s not an inout argument?

jeremy · April 3, 2019, 12:21am

@rxwei could you provide a sample basic training loop that we could modify so as to complete nb 04 in swift? Just a minimal example would be enough so we see the basic idea. It doesn’t have to be clean or elegant…

sgugger · April 3, 2019, 12:25am

To complete my answer above, the following:

func basic_training_loop<Model, Opt:Optimizer> (train_ds:Dataset<Batch>, model: inout Model, opt: inout Opt, 
                         loss_func: @differentiable (Tensor<Float>, Tensor<Float>)->Tensor<Float>)
                         where Opt.Model == Model, Opt.Scalar == Float,
                               Model.Input == Tensor<Float>,
                               Model.Output == Tensor<Float>
{
    for batch in train_ds{
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
            let preds = model.applied(to: batch.x, in: trainingContext)
            return loss_func(preds, batch.y)
        }
        print(loss)
        opt.update(&model.allDifferentiableVariables, along: grads)
    }
}

gives me this error message

<Cell 18>:8:53: note: function is differentiable only with respect to a smaller subset of arguments
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
                                                    ^

error: <Cell 18>:8:53: error: function is not differentiable
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
                                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

<Cell 18>:10:20: note: expression is not differentiable
            return loss_func(preds, batch.y)
                   ^

dan-zheng · April 3, 2019, 12:25am

@sgugger: was this note produced?
note: function is differentiable only with respect to a smaller subset of arguments

If so, please check out TF-310 for more context and a fix.

Try something like (unverified because code snippet is not self-contained):

func basic_training_loop<Model, Opt:Optimizer> (train_ds:Dataset<Batch>, model: inout Model, opt: inout Opt, 
                         loss_func: (Tensor<Float>, Tensor<Float>)->Tensor<Float>)
                         where Opt.Model == Model, Opt.Scalar == Float,
                               Model.Input == Tensor<Float>,
                               Model.Output == Tensor<Float>
{
    for batch in train_ds{
        let (loss, grads) = model.valueWithGradient(at: batch.y) { (model, y) -> Tensor<Float> in
            let preds = model.applied(to: batch.x, in: trainingContext)
            return loss_func(preds, y)
        }
        print(loss)
        opt.update(&model.allDifferentiableVariables, along: grads)
    }
}

sgugger · April 3, 2019, 12:29am

Yeah the error was similar. This snippet is giving me

error: <Cell 20>:13:27: error: generic parameter 'Self' could not be inferred
        opt.update(&model.allDifferentiableVariables, along: grads)

Are you sure about the gradient thing? My ys are the targets (and xs the inputs). I can share the whole notebook I’m working on if this helps, but my Dataset is basically coming from

struct Batch: TensorGroup{
    let x: Tensor<Float>
    let y: Tensor<Float>
}

with the additional methods I stole from your Iris helper module to make it comply to the TensorGroup protocol, then x and y come from Mnist (with y as floats for the first simple model).

jeremy · April 3, 2019, 12:32am

Why don’t you go ahead and commit it @sgugger so we can see what we’re working with

sgugger · April 3, 2019, 12:35am

Done. It’s 02a in dev_swift so this notebook.

rxwei · April 3, 2019, 12:41am

Essentially, inout Model won’t give you a correct mutable allDiffernetiableVariables. We need to get inout Model.AllDifferentiable somewhere. Actually I just came up with a solution that can use a single inout Model to achieve this.

Here goes (not verified):

public struct Example<DataScalar, LabelScalar>: TensorGroup
    where DataScalar: TensorFlowFloatingPoint,
          LabelScalar: TensorFlowScalar {
    public var data: Tensor<DataScalar>
    public var label: Tensor<LabelScalar>
}

public func train<M: Layer, O: Optimizer, DataScalar, LabelScalar>(
    _ model: inout M,
    at variablesKeyPath: WritableKeyPath<M, M.AllDifferentiableVariables>,
    on dataset: Dataset<Example<DataScalar, LabelScalar>>,
    using optimizer: inout O,
    loss: @differentiable (Tensor<Scalar>, @nondiff Tensor<LabelScalar>) -> Tensor<Scalar>
) where O.Model == M, O.Scalar == Float,
        M.Input == Tensor<Scalar>, M.Output == Tensor<Scalar> {
    for batch in dataset {
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
            let preds = model.applied(to: batch.data, in: trainingContext)
            return loss(preds, batch.label)
        }
        print(loss)
        opt.update(&model[keyPath: variablesKeyPath], along: grads)
    }
}

rxwei · April 3, 2019, 12:47am

This is what call sites will look like:

let trainingData = readDataset(...)
var model = Model(...)
let adam = Adam<Model, Float>()
train(&model, at: \Model.allDifferentiableVariables, 
      on: trainingData, using: adam,
      loss: softmaxCrossEntropy(logits:labels:))

jeremy · April 3, 2019, 12:51am

For those new to Swift you may want to read up about the new keypath syntax used above by @rxwei

sgugger · April 3, 2019, 12:58am

The Scalarwasn’t working with the later Float so I tried to fix it in two ways:
the first one here:

func train<M: Layer, O: Optimizer, Scalar, LabelScalar>(
    _ model: inout M,
    at variablesKeyPath: WritableKeyPath<M, M.AllDifferentiableVariables>,
    on dataset: Dataset<Batch>,
    using opt: inout O,
    loss: @differentiable (Tensor<Scalar>, @nondiff Tensor<LabelScalar>) -> Tensor<Scalar>
) where O.Model == M, O.Scalar == Float,
        M.Input == Tensor<Scalar>, M.Output == Tensor<Scalar> {
    for batch in dataset {
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Scalar> in
            let preds = model.applied(to: batch.x, in: trainingContext)
            return loss(preds, batch.y)
        }
        print(loss)
        opt.update(&model[keyPath: variablesKeyPath], along: grads)
    }
}

gives the following error:

warning: <Cell 25>:1:15: warning: redundant conformance constraint 'M': 'Layer'
func train<M: Layer, O: Optimizer, Scalar, LabelScalar>(
              ^

<Cell 25>:1:25: note: conformance constraint 'M': 'Layer' implied here
func train<M: Layer, O: Optimizer, Scalar, LabelScalar>(
                        ^

error: <Cell 25>:11:49: error: cannot convert value of type 'Tensor<Float>' to expected argument type 'Tensor<_>'
            let preds = model.applied(to: batch.x, in: trainingContext)

then the other way I found gives me back the previous error:

func train<M, O: Optimizer>(
    _ model: inout M,
    at variablesKeyPath: WritableKeyPath<M, M.AllDifferentiableVariables>,
    on dataset: Dataset<Batch>,
    using opt: inout O,
    loss: @differentiable (Tensor<Float>, @nondiff Tensor<Float>) -> Tensor<Float>
) where O.Model == M, O.Scalar == Float,
        M.Input == Tensor<Float>, M.Output == Tensor<Float> {
    for batch in dataset {
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
            let preds = model.applied(to: batch.x, in: trainingContext)
            return loss(preds, batch.y)
        }
        print(loss)
        opt.update(&model[keyPath: variablesKeyPath], along: grads)
    }
}

and the message is

<Cell 26>:10:53: note: function is differentiable only with respect to a smaller subset of arguments
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
                                                    ^

error: <Cell 26>:10:53: error: function is not differentiable
        let (loss, grads) = model.valueWithGradient { model -> Tensor<Float> in
                                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

<Cell 26>:12:20: note: expression is not differentiable
            return loss(preds, batch.y)
                   ^

rxwei · April 3, 2019, 1:00am

I’m gonna open Colab and try to write a working example for ya.

jeremy · April 3, 2019, 1:05am

@rxwei that would be great!

(Even better would be opening the notebook in the repo using jupyter notebook so that you can know you’re using the same environment that we’re using for the class, and can easily push your changes back to the repo once it’s working)

rxwei · April 3, 2019, 1:22am

import TensorFlow

public struct Example<DataScalar, LabelScalar>: TensorGroup
    where DataScalar: TensorFlowFloatingPoint,
          LabelScalar: TensorFlowScalar {
    public var data: Tensor<DataScalar>
    public var label: Tensor<LabelScalar>
}

public func train<M, O: Optimizer, S>(
    _ model: inout M,
    at variablesKeyPath: WritableKeyPath<M, M.AllDifferentiableVariables>,
    on dataset: Dataset<Example<S, S>>,
    using optimizer: inout O,
    loss: @escaping @differentiable (Tensor<S>, Tensor<S>) -> Tensor<S>
) where O.Model == M, O.Scalar == S,
        M.Input == Tensor<S>, M.Output == Tensor<S>
{
    let context = Context(learningPhase: .training)
    for batch in dataset {
        let (x, y) = (batch.data, batch.label)
        let (loss, (𝛁model, _)) = model.valueWithGradient(at: y) { (model, y) -> Tensor<S> in
            let preds = model.applied(to: x, in: context)
            return loss(preds, y)
        }
        print(loss)
        optimizer.update(&model[keyPath: variablesKeyPath], along: 𝛁model)
    }
}

This works!

There’s a caveat: The loss function is currently required to be both differentiable w.r.t. all parameters and be differentiated w.r.t. all parameters. This means that loss's second argument cannot be @nondiff yet, which I plan to fix this week. When that’s fixed, a separate generic parameter (LabelScalar) should be defined so that loss functions with numeric labels can be used. For now, we have to use softmaxCrossEntropy(logits:probabilities:).

rxwei · April 3, 2019, 1:26am

I’ll be fixing existing problems this week to ensure course materials work great with differentiation and layer APIs.

sgugger · April 3, 2019, 1:41am

Can confirm this compiles, thanks! Will try to actually use it tomorrow

sgugger · April 4, 2019, 4:36pm

Was busy with course preparation yesterday, going back to this. What is the thing we pass to variables and how do we get it from the model? There is a model.allVariableKeyPaths but it takes a to argument that I haven’t figured out yet.