Me (likely) failing to understand code

radek · November 13, 2018, 2:13pm

I encountered this piece of code in vision/image.py:

def pil2tensor(image:Union[NPImage,NPArray],dtype:np.dtype)->TensorImage:
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim==2 : a = np.expand_dims(a,2)    
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy( a.astype(dtype, copy=False) )

Could I please ask why the double transpose? Would replacing the two transpose with the following

a = np.transpose(a, (2, 0, 1))

not have the same effect?

(I am not trying to be picky - just would like to confirm my understanding, maybe there is something I am not seeing here which could lead to bugs in the code that I write).

Thx a lot!

sgugger · November 14, 2018, 2:09pm

I’m not the one who wrote the last version, I think it was @Kaspar
I merged it this way because the tests showed it was fast, is there a speed reason for the two different transpose? Or can we do just one?

Feel free to run a few timeit @radek to see if we can safely refactor.

Kaspar · November 14, 2018, 3:59pm

@radek You are correct - nice improvement
I have inserted you version in this notebook : https://github.com/kasparlund/fastaiNotebooks.git
in the folder pil2tensor. Your version is about 0.5 % faster on images in my mac (with pillow simd)
It is 8% faster when converting a in memory numpy array.

There is already a test “test_vision_pil2tensor” in the testfolder so a PR should be fast to make