 # You can only compute the gradient of functions that return scalar values

(Koen Dejonghe) #1

I was a bit surprised to get below error when trying to obtain the gradient for a layer without activation or loss attached to it:

``````Precondition failed: The function being differentiated produced a tensor with shape [5, 4, 3].
You can only compute the gradient of functions that return scalar values.
``````

Then I tried a similar thing in PyTorch, which gave me:
`grad can be implicitly created only for scalar outputs`
So, that seems about the same error as in S4TF.

PyTorch though allows me to write:
`outputs.backward(torch.ones_like(outputs))`
which works, and allows me to test the correctness of the gradients of the layer.

Any idea how this can be done in S4TF?
Thank you.

1 Like

#2

Haven’t used S4TF at all, and not very familiar with the maths, so could be off here, but I have written a couple of custom backwards kernels so some familiarity with the mechanics.
Yeah, you need a scalar for backwards, I’ve use stuff like:

``````x = torch.randn(10) # Input
y = my_forward(x)
loss = y.mean()
# grad_out is now the gradient of loss w.r.t output of my_forward (y)

# my_backward calculates gradient from input and grad_out, so I pass x
``````

I think that you should find:

``````grad_inp == y.backward(torch.ones_like(y)) / grad_out
``````

Not entirely across the details, code largely copied from elsewhere, but seems to work (though the above may not as that’s just adapting from this code which is a little harder to follow).

Doesn’t help with how to do it in S4TF but might at least help understand what’s going on on the PyTorch side to let you just do `l.backward()` or . `y.backward(torch.ones_like(y))`. This post also looks to have some nice details on how `y.backward(...)` works in PyTorch.
I gather part of your issue is that whatever you’re using in S4Tf behaves more like `torch.autograd.grad` than `.backward()`. I frequently encountered the only scalars issue with `.grad()`.

1 Like

#3

Hi there.
In PyTorch when `outputs` is not a scalar and you pass a tensor to the `gradient` argument of `backward`, it first takes the dot product of `outputs` and `gradient` to obtain a scalar, and then it applies `backward()` to it.
So, in your case `outputs.backward(torch.ones_like(outputs))` should be the same as:

``````outputs = outputs.sum()
outputs.backward()
``````
1 Like

(Stephen Johnson) #4

Try using appliedForBackpropagation

Example:

``````import TensorFlow
let layer = Dense<Float>(inputSize: 5, outputSize: 3)
let input = Tensor<Float>([[2.3, -1.2, 4.7, -2.1, 3.0]])
let (predicted, backprop) = layer.appliedForBackpropagation(to: input)

▿ 2 elements
- weight : [[ 2.3,  2.3,  2.3],
[-1.2, -1.2, -1.2],
[ 4.7,  4.7,  4.7],
[-2.1, -2.1, -2.1],
[ 3.0,  3.0,  3.0]]
- bias : [1.0, 1.0, 1.0]
- inputGradient : [[-0.56547254,   1.2013183,  -1.1583307, -0.40139845,  -1.4616363]]``````
2 Likes

(Koen Dejonghe) #5

Aha. Yes, I guess that is that I was looking for. Thank you.

1 Like