Rough edges

sgugger · April 2, 2019, 5:58pm

This topic is to list situations encountered where the error message could be clearer or code that maybe shouldn’t throw an error to be more user friendly.

I’ll begin with a few things:

Tensor(randomNormal: [a,b]) is going to be used a lot and it would be great if it accepted Int and not just Int32 (error message is clear though)
More annoying, creating

struct MyModel: Layer {
    var layer1 = Dense<Float>(inputSize: m, outputSize: nh, activation: relu)
    var layer2 = Dense<Float>(inputSize: nh, outputSize: 1)
    
    @differentiable
    func applied(to input: Tensor<Float>, in context: Context) -> Tensor<Float> {
        return input.sequenced(in: context, through: layer1, layer2)
    }
}

when m or nh is an Int32 and not an Int will return the following error message:

error: <Cell 27>:2:18: error: generic parameter 'T' could not be inferred
    var layer1 = Dense<Float>(inputSize: m, outputSize: nh, activation: relu)
                 ^

TensorFlow.relu:2:24: note: in call to function 'relu'
@inlinable public func relu<T>(_ x: TensorFlow.Tensor<T>) -> TensorFlow.Tensor<T> where T : FloatingPoint, T : TensorFlowScalar
                       ^

warning: <Cell 27>:2:9: warning: stored property 'layer1' has no derivative because it does not conform to 'Differentiable'; add '@noDerivative' to make it explicit
    var layer1 = Dense<Float>(inputSize: m, outputSize: nh, activation: relu)
        ^
    @noDerivative

which is very unclear. And having Int32 comes quite naturally if your m comes from the shape of a tensor for instance (here I had m = x_train.shape[1]).

Not sure what’s going on behind the scenes, but ideally the user would like to not care about Int/Int32 or if they have to, it would be great if they had to worry about always having the same kind of int.

Kaspar · April 2, 2019, 6:21pm

Totally agree with the need need for less friction using Int/Int32. Swift.org recommends using Int (and Double) as default unless there is a good reason not to.

Better documentation site would make it much easier to use the library:
-with link to the source
-more compact documentation
-with search limited to functionality that can be used in swift

rxwei · April 2, 2019, 6:39pm

You are totally right that TensorShape's element type shouldn’t be Int32. It’s mainly for historical reasons in early days of Swift for TensorFlow with graph program extraction. Changing the element type to Int can fix the rough edges you encountered.

That said, Tensor<Int> is intentionally made not possible because of the reality of hardware defaults. Int always defaults to the pointer-size integer on the host machine, thus Int64 most of the time. But this can lead to unexpected inefficiency (or behavior) on accelerators.

jeremy · April 2, 2019, 7:03pm

Doesn’t that suggest that to get consistency throughout the API that we need to use Int32 everywhere? Otherwise it sounds like we’ll still have different sized ints expected in different places and need to convert back and forth.

rxwei · April 2, 2019, 7:21pm

Standardizing on Int32 will likely produce more rough edges in practice. All Swift collection types and almost all non-TensorFlow libraries use Int indices instead of Int32.

Tensor<Int32> is only about the scalar type, so it shouldn’t affect the usability of indexing and shapes. In addition to changing TensorShape’s element type, we should also change subscript and other APIs that take indices to use Int.

jeremy · April 2, 2019, 8:48pm

Yeah good point.

clattner · April 3, 2019, 4:31pm

Agreed that this is a complicated area, more complicated than it should be, and I agree with Richard that we don’t want to support Tensor<Int> because it will have surprising behavior. A design pivot we made a month or two back was to keep all the tensor ops in the tensor domain: things like tensor.mean() now return a tensor instead of a scalar.

I think the probably best design here is to accept Int in Shape and subscript indices and any other places where dimensions are used (this aligns with most of Swift) but keep Tensor away from Int. We should probably also add initializers to Shape and subscripts on Tensor that take (1D) tensors of integers.

Richard, do you see any problem or have any concerns with this approach?

-Chris

Kaspar · April 3, 2019, 5:27pm

I found another edge concerning Int32 (unless i am misusing swift).

The snippet below is a swift version of “nb02_fully_connected.ipynb”. When T is required to be compatible with BinaryFloatingPoint then the literal 2.0 is converted to Tensor seamlessly.

However the Int32 from inp.shape[Int32] is not !

func mse_grad<T>(inp:Tensor<T>, targ:Tensor>T>) -> Tensor<T>; where T: BinaryFloatingPoint {
    // grad of loss with respect to output of previous layer
    return 2.0 * (inp.squeezingShape() - targ).expandingShape(at:1) / T(inp.shape[ Int32(0) ])
}

jeremy · April 3, 2019, 5:51pm

I’m really glad you did this. PyTorch used to return scalars instead of tensors and it made code so much more complicated to handle. When they added rank-0 tensors and used them everywhere I was able to remove a lot of messy code.

rxwei · April 3, 2019, 6:03pm

It is pretty much what I had in mind. I don’t see any concerns.

I’m yet unconvinced about making subscript take 1-D tensors, but I’d like to see some motivating use cases after advanced indexing gets implemented.

clattner · April 3, 2019, 6:44pm

I don’t have any specific reason to think we need 1D tensor subscripts right now. There reason I bring it up is that the SIMD proposal is going to support vectors subscripted by vectors, and this would also allow us to keep full chains of tensor operations “in graph” with GPE in a more natural way.

-Chris

rxwei · April 3, 2019, 7:09pm

Did you want floating point division to be able to take an integer on the right hand side?

Kaspar · April 3, 2019, 7:55pm

I expected to be a floatingpoint diviione by a float, because i expected that the Int32 returned by inp.shape[Int32(0)] would convert to type T (which in my case is af Float) and from there be converted to a Tensor that could be used in the division to normalize for the number of element in the first dimension of the Tensor “inp”.

Somehow that doesn’t works without the explicit conversion T( ) but it does work for the literal 2.0

rxwei · April 3, 2019, 8:02pm

This is expected behavior. A TensorShape's subscript result should not be convertible to Float. It is always an integer and requires explicit casting. Swift does not have custom implicit conversions from values to values because the programming model is designed to be predictable. Moreover, the conversion from an integer to a Float is not always safe or truly value-preserving, so it needs to be explicit.

jeremy · April 3, 2019, 8:22pm

It’s common to want to use tensor subscripts in pytorch code - e.g. you might get a bunch of indexes using argmax or similar, then use them in an embedding. It’s also common to want to do tensor ops on indexes prior to passing them to an indexer.