Why do we stack the "seven_tensors" in Chapter 4?

Kazutadashi · June 5, 2024, 1:31am

Hello all!

I am currently working on chapter 4 of the notebook, and have a question on why we stack the three and seven tensors to compute the average pixel density.

Before this stacking operation, we use a list comprehension to create a list of rank-2 tensors if my understanding is correct, which looks like this:

seven_tensors = [tensor(Image.open(o)) for o in sevens]
three_tensors = [tensor(Image.open(o)) for o in threes]

This would then create a collection of rank-2 tensors, which should be itself a rank-3 tensor. Is this the right way of thinking about this? This appears to do the stacking before we actually call the stacking method later here:

stacked_sevens = torch.stack(seven_tensors).float()/255
stacked_threes = torch.stack(three_tensors).float()/255

If we compare the first element of each of these objects we get

print(seven_tensors[0].shape)
print(torch.stack(seven_tensors)[0].shape)

Output:
torch.Size([28, 28])
torch.Size([28, 28])

Are we calling the .stack() method on this already rank-3 tensor just to convert it from a list object to a Tensor? Instead of the list comprehension, couldn’t we stack those rank-2 tensors together to make the rank-3 tensor?

Thank you everyone for your time and help, I am glad to be joining this community!

vbakshi · June 6, 2024, 3:05am

Someone with more experience with PyTorch might have better explanations but giving it a shot:

Are we calling the .stack() method on this already rank-3 tensor just to convert it from a list object to a Tensor ?

A small correction—it’s not already a rank-3 tensor, it’s a list (of rank-2 tensors), but you are right that stack is called to convert it from a list to a tensor (so that we can do things like call .mean to find the average pixel value for each of the 28x28 pixels).

Instead of the list comprehension, couldn’t we stack those rank-2 tensors together to make the rank-3 tensor?

I tried to run the following code:

sevens_t = tensor([tensor(Image.open(o)) for o in sevens])

But got the following error:

TypeError: only integer tensors of a single element can be converted to an index

I also tried to construct a simple tensor as follows:

tensor(
    tensor([0,0]),
    tensor([0,0]),
    tensor([0,0])
)

And got a different error:

ValueError: only one element tensors can be converted to Python scalars

And this code (passing a list of tensors to torch.tensor):

tensor([
    tensor([0,0]),
    tensor([0,0]),
    tensor([0,0])
])

Results in the same error as the first one:

TypeError: only integer tensors of a single element can be converted to an index

Whereas this works:

tensor([
    tensor([0]),
    tensor([1]),
    tensor([2]),
])

The documentation for torch.tensor says that the data from which you create a tensor—

Can be a list, tuple, NumPy ndarray , scalar, and other types.

But from all of the above it seems like you can’t make a tensor out of tensors with more than one dimension using torch.tensor, which is why something like torch.stack is needed, which from the docs:

Concatenates a sequence of tensors along a new dimension.

Again, someone with a deeper understanding of PyTorch may have a simpler, more direct explanation.

Kazutadashi · June 7, 2024, 2:19am

I see! Thank you so much for these very clear examples.

A small correction—it’s not already a rank-3 tensor, it’s a list (of rank-2 tensors), but you are right that stack is called to convert it from a list to a tensor (so that we can do things like call .mean to find the average pixel value for each of the 28x28 pixels).

So with this in mind, a matrix is not necessarily a rank-2 tensor unless it was created by “stacking” rank-1 tensors together rather than simply merging them in a list. But if converted properly, the matrix is essentially the same thing as a rank-2 tensor (given the additional numerical and rectangular requirements). Is this the right way to think of it?

Thank you again for your reply! I appreciate it.

alx42 · June 10, 2024, 4:08am

seven_tensors and three_tensors are lists containing multiple 2 dimensional tensors (rank-2). stacked_threes and stacked_sevens are each one a SINGLE 3-dimensional tensors (rank-3). Before the stack method, you have a list of many small tensors and after you stack you have one big tensor for three and one big for sevens. This is so you can run efficient tensor operations on all images in parallel.

alx42 · June 10, 2024, 4:43am

The first 3 examples don’t work because tensor() function only accepts basic data types (float, int, bool) as arguments. Tensors need to have a consistent/same data type across all elements and when you create a tensor of tensors, all the tensors need to have the same shape.

The tensor() function focuses on converting data to tensor format. Stack() does a check on the list of tensors you provide to make sure they have the same shape before stacking them into a new higher dimension tensor. Arguably the functionality could have been included on the tensor() itself but probably for efficiency and flexibility the functionality is separated.

The last example does work because they are scalars/numbers.