How should I program R[i, j] = V[i + A[j]] in PyTorch?

Pomo · August 16, 2019, 6:54am

Hello PyTorch experts,

I want to index a vector by a matrix of indexes. The indexes have the form:

    A=0 1 3 7  15...(an example only - A can be any set of numbers)
    -------------
    0|0 1 3 7  15 ...
    1|1 2 4 8  16
    2|2 3 5 9  17          =   M
    3|3 4 6 10 18
    4|4 5 7 11 19
...

In words, M[i,j] = i+A[j].

What is an efficient way to express this in PyTorch, ideally without Python loops?

After the matrix of indexes is created, I will be indexing a vector of numbers with it. Is there perhaps an efficient way to achieve this result directly? That is, R[i,j] = V[i+A[j]]

Thanks for your help!

TomB · August 16, 2019, 7:42am

You can use broadcasting to achieve this:

>>> A = torch.tensor([0,1,3,7,15])
>>> M = A + torch.arange(A.shape[0])[None,:].t()
>>> M
tensor([[ 0,  1,  3,  7, 15],
        [ 1,  2,  4,  8, 16],
        [ 2,  3,  5,  9, 17],
        [ 3,  4,  6, 10, 18],
        [ 4,  5,  7, 11, 19]])

The [None,:].t() is adding a singleton dimension then transposing, so it’s a column matrix, i.e.:

>>>  torch.arange(5)[None,:].t()
tensor([[0],
        [1],
        [2],
        [3],
        [4]])

Pomo · August 17, 2019, 1:46am

Thanks, it works perfectly and is quite fast with CUDA.

Now I’ve got to figure out how to use this single 2-d M to extract a 3-d tensor out of a 2-d matrix of V’s. Any hints to the best approach?

Pomo · August 17, 2019, 3:10am

Ok, I figured some things out by experimentation.

Suppose V is a matrix (it happens to be 5 different timeseries):

V = torch.arange(50).reshape((5,10))
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

We construct a matrix of indexes, as in TomB’s response above. These represent lookbacks (A) starting at each available timepoint:

A = torch.LongTensor([0,1,3])
M = lb+(torch.arange(7, dtype=torch.long)[None,:].t())
tensor([[0, 1, 3],
        [1, 2, 4],
        [2, 3, 5],
        [3, 4, 6],
        [4, 5, 7],
        [5, 6, 8],
        [6, 7, 9]]

Then the derived lookback timeseries for each of the 5 original series are:

V[:,M]
tensor([[[ 0,  1,  3],
         [ 1,  2,  4],
         [ 2,  3,  5],
         [ 3,  4,  6],
         [ 4,  5,  7],
         [ 5,  6,  8],
         [ 6,  7,  9]],

        [[10, 11, 13],
         [11, 12, 14],
         [12, 13, 15],
         [13, 14, 16],
         [14, 15, 17],
         [15, 16, 18],
         [16, 17, 19]],

        [[20, 21, 23],
         [21, 22, 24],
         [22, 23, 25],
         [23, 24, 26],
         [24, 25, 27],
         [25, 26, 28],
         [26, 27, 29]],

        [[30, 31, 33],
         [31, 32, 34],
         [32, 33, 35],
         [33, 34, 36],
         [34, 35, 37],
         [35, 36, 38],
         [36, 37, 39]],

        [[40, 41, 43],
         [41, 42, 44],
         [42, 43, 45],
         [43, 44, 46],
         [44, 45, 47],
         [45, 46, 48],
         [46, 47, 49]]])

I realize this usage may be obscure, but maybe it will help someone someday. I am gratified that PyTorch is esthetic and consistent!

TomB · August 17, 2019, 10:22am

Glad you seem to have it in hand. I don’t entirely understand what you’re trying to do so could be off here.
I would just note that I would suspect (though could be wrong), that indexing by an array (i.e. V[:,M]) isn’t the most efficient CUDA operation due to memory access patterns. You’re also copying around a lot of data. You may want to look at torch.as_strided. The python binding isn’t documented (there’s an issue around) but there are minimal C++ docs showing arguments (IntArrayRef allows a python array/tuple) and the semantics follows numpy.lib.stride_tricks.as_strided (though the PyTorch version seems to have better bounds checking so the warnings in the numpy docs and potential to overwrite arbitrary memory doesn’t seem to apply in PyTorch).
This has the advantage of not involving any copying, it just creates a view with arbitrary strides. It doesn’t allow arbitrary lookback indexing, but would allow a view of lookback that you could then index into (Assuming I’m understanding what you’re trying to do). So:

>>> t = torch.arange(16)
>>> t2 = t.as_strided((7, 4), (2,1)); t2
tensor([[ 0,  1,  2,  3],
        [ 2,  3,  4,  5],
        [ 4,  5,  6,  7],
        [ 6,  7,  8,  9],
        [ 8,  9, 10, 11],
        [10, 11, 12, 13],
        [12, 13, 14, 15]])
>>> t2[:,0:2]                                                                                                                                                                                                       
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13]])

Note it is a view so updating t2 would also update t and performing in-place operations on t2 can cause weird results as multiple elements of t2 are referencing the same memory location.
There are also potential issues with using strided tensors in the backward pass, there are some notes in the source code about this I haven’t fully absorbed yet (googling ‘as_strided pytorch backwards’ or some such should find some info on this). But if just using it to process input this shouldn’t matter (and may be fine otherwise, seemed to partly be an inherent issue in gradient calculation as much as an implementation limitation).

Pomo · August 17, 2019, 8:10pm

Hi TomB,

Thanks for taking the time to look into this issue. What I am experimenting with is modeling time series without using an RNN. Models from several recent papers use layered 1-d convolutions with increasing dilations to capture patterns on larger time scales. Here I want to make something like a single 1-d convolution but itself having a non-uniform, increasing dilation. The idea is it could learn patterns on various time scales simultaneously. A 2-d version might also address some of the problems with spacially limited receptive fields.

As far as I can tell, no such thing has been implemented, though I can easily imagine a very fast C++ implementation. This indexing trick (with Linear) is the best simulation I’ve been able to come up with in PyTorch. You’re right - it’s not at all efficient or elegant. At least broadcasting to get the indexes works better than the loops I used before.

Thanks again for your help.