Tensor.replacing(with:where:): replacing on false

As documentation says:

Replaces elements of this tensor with other in the lanes where mask is true .

DOC: Swift for TensorFlow

But using it it seems to replace values where mask is false:

import TensorFlow

typealias TF = Tensor<Float>

// Initialize random numbers
let rr = TF(randomNormal: [10,5])
print("rr = ",rr)

// Create a mask
let mask = rr.<(0-1.0)
print("mask = ", mask) 

// Replaced
let replaced = rr.replacing(with: TF(0).broadcast(like: rr), where: mask)
print("replaced = ",replaced)

returns:

rr =  [[   -0.2687458,     1.6474121,  -0.061800487,    -1.7457179,    -0.3315584],
 [    1.3515772,     0.6790431,    0.14319876,     1.7425705,    -1.9664636],
 [  -0.32543635,    0.75455797,     0.9851794,   -0.12352676,   0.029595692],
 [   0.54839504,   -0.30570582,     1.7317035,    0.45856386,     0.8892455],
 [   0.14538142, -0.0019000744,    -0.2195302,    -0.3196175,   -0.02673261],
 [  -0.60845137,   -0.36677366,     1.3494298,    -1.3287013,    -1.6256953],
 [    0.8583815,     1.1418674,    -0.6815512,     1.0948774,    0.20448415],
 [  -0.68553835,    -0.9695941,   -0.50244117,      1.037796,    0.70121026],
 [   -0.5385072,     1.1612647,     1.7953675,    0.65119404,     1.5617983],
 [    -1.237274,     1.0212688,    -0.5734267,    0.91085374,     0.3272885]]
mask =  [[false, false, false,  true, false],
 [false, false, false, false,  true],
 [false, false, false, false, false],
 [false, false, false, false, false],
 [false, false, false, false, false],
 [false, false, false,  true,  true],
 [false, false, false, false, false],
 [false, false, false, false, false],
 [false, false, false, false, false],
 [ true, false, false, false, false]]
replaced =  [[       0.0,        0.0,        0.0, -1.7457179,        0.0],
 [       0.0,        0.0,        0.0,        0.0, -1.9664636],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [       0.0,        0.0,        0.0, -1.3287013, -1.6256953],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [       0.0,        0.0,        0.0,        0.0,        0.0],
 [ -1.237274,        0.0,        0.0,        0.0,        0.0]]

THE API I WOULD LIKE

rr[mask] = TF(123)

// or inline
rr[rr.<(0-1.0)] = TF(123)

Probably a new TensorRange case is needed for that…

Sorry if I bother you @marcrasi, is it possible to print in a notebook the version of swift/jupyter & S4TF?
I would like to add it in the gist…

Here’s a snippet that will print out the commit hash of the current Swift toolchain:

public extension String {
    @discardableResult
    func shell(_ args: String...) -> String
    {
        let (task,pipe) = (Process(),Pipe())
        task.executableURL = URL(fileURLWithPath: self)
        (task.arguments,task.standardOutput) = (args,pipe)
        do    { try task.run() }
        catch { print("Unexpected error: \(error).") }

        let data = pipe.fileHandleForReading.readDataToEndOfFile()
        return String(data: data, encoding: String.Encoding.utf8) ?? ""
    }
}

"\(Bundle.main.bundlePath)/swift".shell("--version")

(This works by calling your swift toolchain’s swift binary. It seems likely Bundle.main.bundlePath will find the binary regardless of where it is, but I have only tested on Colab, so it’s possible that it doesn’t work in other environments.)

It would be nicer to have commit hash and also version string (e.g. “0.3.1”) available as Swift vars so that you don’t need to call out to a binary, but I don’t think that exists anywhere right now. I won’t have time to add such a thing myself soon, but it might be a nice starter issue for someone to work on. I’m going to compile a list of starter issues soon, and I’ll make sure to include this.

1 Like

Thank you!

Interesting. We should definitely fix that! Here’s the definition of it and it’s wrong.

It should be

return Raw.select(condition: mask, t: other, e: self)
1 Like

I opened bug TF-492. If you are interested in fixing it, you are always welcome to submit a PR! Otherwise, I’ll get to it in the next few days.

1 Like

PR Done :wink:

2 Likes

This is an interesting direction! A new case in TensorRange may not be a good fit because a TensorRange only applies to one dimension.

For now, we can start thinking about adding a subscript that takes a boolean tensor. All subscripts require also a getter, so what’s unclear to me is whether scalars under false should be replaced with zero by default:

// Option 1
public extension Tensor where Scalar: Numeric {
    subscript(mask: Tensor<Bool>) -> Tensor {
        get {
            return Tensor(0).broadcast(like: self).replacing(with: self, where: mask)
        }
        set {
            return replacing(with: newValue, where: mask)
        }
    }
}

Or, we could make it take a default scalar that specifies the value under false.

// Option 2, take a default scalar
public extension Tensor where Scalar: AdditiveArithmetic {
    subscript(mask: Tensor<Bool>, otherwise scalarOnFalse: Scalar = .zero) -> Tensor {
        get {
            return Tensor(scalarOnFalse).broadcast(like: self).replacing(with: self, where: mask)
        }
        set {
            return replacing(with: newValue, where: mask)
        }
    }
}
// Option 3, take a non-default tensor, achieving `replacing(with:where:)`'s full functionality.
public extension Tensor where Scalar: AdditiveArithmetic {
    subscript(mask: Tensor<Bool>, otherwise scalarsOnFalse: Tensor) -> Tensor {
        get {
            return scalarOnFalse.replacing(with: self, where: mask)
        }
        set {
            return replacing(with: newValue, where: mask)
        }
    }
}

I personally prefer option 2, as option 3 could be harder to use since it takes two tensors.

1 Like

TL;DR

  • GET: We can only create subscript for “get” because Tensor is immutable.
  • SET: For set we should stay on “replacing”.

GET

We can use all your three subscripts :wink:

  • OPTION 1: mask (implicit default value to zero).
  • OPTION 2: mask and scalar (cannot use defaults in subscript!).
  • OPTION 3: mask and tensor (implicitly broadcasted).
// Option 1, only mask
public extension Tensor where Scalar: Numeric {
    subscript(mask: Tensor<Bool>) -> Tensor {
        return Tensor(0).broadcast(like: self).replacing(with: self, where: mask)
    }
}

// Option 2, mask + scalar
public extension Tensor where Scalar: AdditiveArithmetic {
    subscript(mask: Tensor<Bool>, otherwise scalarOnFalse: Scalar) -> Tensor {
        return Tensor(scalarOnFalse).broadcast(like: self).replacing(with: self, where: mask)
    }
}

// Option 3, mask + tensor (broadcasted), achieving `replacing(with:where:)`'s full functionality.
public extension Tensor where Scalar: AdditiveArithmetic {
    subscript(mask: Tensor<Bool>, otherwise scalarsOnFalse: Tensor) -> Tensor {
        return scalarsOnFalse.broadcast(like: self).replacing(with: self, where: mask)
    }
}

NOTE: I got this error trying to use original “OPTION 2”

error: <Cell 6>:3:67: error: default arguments are not allowed in subscripts

EXAMPLES:

typealias TF=Tensor<Float>
let rr = TF(randomNormal: [5,3])
rr // ORIGINAL VALUE

[[  0.1402327, -0.79306823,   1.3267603],
 [-0.75794446, -0.78503853,  -0.7688747],
 [  2.1296802,  -1.9129561,   0.9286265],
 [ -0.6535413,  -1.0384786,  0.16424443],
 [  1.6514189,  -1.5904704, -0.92496645]]
rr[rr.>1] // OPTION 1

[[  0.1402327, -0.79306823,         0.0],
 [-0.75794446, -0.78503853,  -0.7688747],
 [        0.0,  -1.9129561,   0.9286265],
 [ -0.6535413,  -1.0384786,  0.16424443],
 [        0.0,  -1.5904704, -0.92496645]]
rr[rr.>1, otherwise: 999.0] // OPTION 2

[[  0.1402327, -0.79306823,       999.0],
 [-0.75794446, -0.78503853,  -0.7688747],
 [      999.0,  -1.9129561,   0.9286265],
 [ -0.6535413,  -1.0384786,  0.16424443],
 [      999.0,  -1.5904704, -0.92496645]]
rr[rr.>1, otherwise: TF(repeating: 999, shape: [5,3])]  // OPTION 3
[[  0.1402327, -0.79306823,       999.0],
 [-0.75794446, -0.78503853,  -0.7688747],
 [      999.0,  -1.9129561,   0.9286265],
 [ -0.6535413,  -1.0384786,  0.16424443],
 [      999.0,  -1.5904704, -0.92496645]]
rr[rr.>1, otherwise: TF(999)]  // OPTION 3 BROADCASTED
[[  0.1402327, -0.79306823,       999.0],
 [-0.75794446, -0.78503853,  -0.7688747],
 [      999.0,  -1.9129561,   0.9286265],
 [ -0.6535413,  -1.0384786,  0.16424443],
 [      999.0,  -1.5904704, -0.92496645]]

SET (…we can’t…)

The “API I WOULD LIKE” is not possible :wink:
I’ve realized that we can’t use the “set” method if Tensor is immutable (AKA value semantics).

DETAILS

This code make only sense if the modification happens “in place”:

rr[mask]=Tensor<Float>(0)

If the “set” returns a vlue you’ve to write something like this to catch the value:

let rr1 = (rr[mask]=Tensor<Float>(0))

That’s weird…

NOTE: in my version of swift the “set” method in subscript does not have a “return”.
I’m using the official jupyter S4TF docker image:

Swift version 5.0-dev (LLVM dcb9eb74a7, Clang 95cdf7c9af, Swift dc31c3fcd2)
In my version trying to exacute the code I’ve got this error:
error: unexpected non-void return value in void function
From the subscript doc:

subscript(index: Int) -> Int {
    get {
        // return an appropriate subscript value here
    }
    set(newValue) {
        // perform a suitable setting action here
    }
}

FURTHER INFORMATION AND COMPARISON WITH PYTHON LIBRARIES

Same kind of bool mask indexing is present in numpy, pandas and @jeremy added it to the base class ListContainer in the new course.

# numpy behaviour

rr=np.random.randn(3,5)

rr
Out[1]: 
array([[ 1.5634806 ,  0.14820304,  0.31589817, -1.64715999,  1.33083382],
       [-0.57917724,  0.24835197, -0.23703362, -0.42864753, -1.49701077],
       [-0.19529667,  0.32040469, -0.1250799 ,  0.27980658,  1.28546453]])

mask=rr>1

mask
Out[2]: 
array([[ True, False, False, False,  True],
       [False, False, False, False, False],
       [False, False, False, False,  True]])

rr[mask]=0

rr
Out[3]: 
array([[ 0.        ,  0.14820304,  0.31589817, -1.64715999,  0.        ],
       [-0.57917724,  0.24835197, -0.23703362, -0.42864753, -1.49701077],
       [-0.19529667,  0.32040469, -0.1250799 ,  0.27980658,  0.        ]])

IMPORTANT: in python this syntax usually means an in place modification, while the actual replacing(:,:) has a Value Semantics behaviour returning a new tensor.

Hi @rxwei,
Working at unit test for Tensor.replacing(::) fix, I’ve seen that replacing has no “implicit broadcasting”.

var tensor3D = Tensor<Float>(shape: [3, 4, 5], scalars: Array(stride(from: 0.0, to: 60, by: 1)))

[[[ 0.0,  1.0,  2.0,  3.0,  4.0],
  [ 5.0,  6.0,  7.0,  8.0,  9.0],
  [10.0, 11.0, 12.0, 13.0, 14.0],
  [15.0, 16.0, 17.0, 18.0, 19.0]],

 [[20.0, 21.0, 22.0, 23.0, 24.0],
  [25.0, 26.0, 27.0, 28.0, 29.0],
  [30.0, 31.0, 32.0, 33.0, 34.0],
  [35.0, 36.0, 37.0, 38.0, 39.0]],

 [[40.0, 41.0, 42.0, 43.0, 44.0],
  [45.0, 46.0, 47.0, 48.0, 49.0],
  [50.0, 51.0, 52.0, 53.0, 54.0],
  [55.0, 56.0, 57.0, 58.0, 59.0]]]

// Actual explicit form
tensor3D.replacing2(with: TF(-1).broadcast(like: tensor3D), where: tensor3D.>30)

[[[ 0.0,  1.0,  2.0,  3.0,  4.0],
  [ 5.0,  6.0,  7.0,  8.0,  9.0],
  [10.0, 11.0, 12.0, 13.0, 14.0],
  [15.0, 16.0, 17.0, 18.0, 19.0]],

 [[20.0, 21.0, 22.0, 23.0, 24.0],
  [25.0, 26.0, 27.0, 28.0, 29.0],
  [30.0, -1.0, -1.0, -1.0, -1.0],
  [-1.0, -1.0, -1.0, -1.0, -1.0]],

 [[-1.0, -1.0, -1.0, -1.0, -1.0],
  [-1.0, -1.0, -1.0, -1.0, -1.0],
  [-1.0, -1.0, -1.0, -1.0, -1.0],
  [-1.0, -1.0, -1.0, -1.0, -1.0]]]

Using implicit broadcasting the syntax is little bit more compact

// Possible Implicit broadcast
tensor3D.replacing(with: TF(-1), where: tensor3D.>30)

Of course, implicit broadcasting sometimes hide some shape errors, but for the ones that prefer to be verbose, we can use an optional named param autoBroadcast: Bool = true

func replacing(with other: Tensor, where mask: Tensor<Bool>, broadcast: Bool = true) -> Tensor {
...
}

Make sense?

The lack of implicit broadcasting in replacig(with:where:) was not intentional. It’s because the corresponding TensorFlow Select (Raw.select) op did not support broadcasting as of a year ago. Making it implicitly broadcast sounds good to me, but I’d recommend against using a boolean flag for now to keep it consistent with the rest of the operators.

2 Likes

Hi @rxwei, @dan-zheng,
I’ve created a new pull request (#25081) with the following additions:

  1. Unit test for tensor.replacing(with,where) fix (#24635)
  2. Implicit broadcast for with parameter, with relative unit test.

Example:

// You can write this: 
let b = a.replacing(with: Tensor<Float>(-1), where: a.>3) 

// Instead of this: 
let c = a.replacing(with: Tensor<Float>(-1).broadcast(like: a), where: a.>3)
2 Likes