Reading in Images with v1

I have a folder that contains my images under data/trainImages and dataframe that looks like this:

image	coords
0	IMG_0005.JPG	[428, 252, 694, 264, 696, 525, 428, 528]
1	IMG_0006.JPG	[307, 210, 609, 238, 608, 521, 302, 528]
2	IMG_0007.JPG	[383, 177, 703, 209, 710, 530, 383, 539]
3	IMG_0012.JPG	[350, 46, 800, 96, 810, 532, 350, 554]
4	IMG_0013.JPG	[573, 139, 1026, 153, 1036, 590, 561, 589]

I was wondering how I would go about loading the datasets into a dataBunch with fastai v1.

I can go ahead and append data/trainImages/ to the image column if it helps. All images are of varying sizes.

I do apologize if this is a basic question, but I can’t seem to get my head around it even after looking at the documentation.

1 Like

You should use the data block API. Since you data is in a dataframe, it begins with

src = ImageItemList.from_df(df, cols='image', path=Path(data), folder='trainImages')

this will automatically append data/trainImages/ in the columns of images.
Then you can split randomly (random_split_by_pct) or by passing indices (split_by_idx). The labelling will be a bit tricky, I’d go with label_from_func and you need to define a function that takes a filename and returns a proper target (probably ImagePoints in your case since the name of that column is coordinates). There is an example of that in the head pose notebook.

1 Like

I got some distance by using the following bit of code (which is adapted from head-pose.ipynb):

def get_coords(fname): 
  fname = fname.split('/')[-1]
  return torch.FloatTensor(df.loc[df.image==fname,'coords'].values[0][:2]) #.reshape(-1,2)

src = PointsItemList.from_df(df, cols='image', path=PATH, folder='Data_Training')

data = (src
        .random_split_by_pct()
        .label_from_func(get_coords)
        .transform(get_transforms(), tfm_y=True, size=(120,160))
        .databunch().normalize(imagenet_stats)
       )

However, this only works if I take 2 coordinates at a time (x,y). However I have 4 set of points to predict, with (x,y) for each point making it 8 things to predict.

Was wondering how I can go and adapt the get_coords function to take in the 8 coordinate points. My attempt to do reshape(-1,2) didn’t work. The above gave me the warning: UserWarning: It's not possible to collate samples of your dataset together in a batch. Shapes of the inputs/targets: [... torch.Size([3, 120, 160]), torch.Size([3, 120, 160])], [torch.Size([4, 2]), torch.Size([2, 2]), torch.Size([4, 2])...]. I ran all possible filenames through get_coords and they all have shapes [4,2]. Not sure how it got a [2, 2] shape as shown in the warning.

When I tried to run data.show_batch(3, figsize=(9,6)) I got the error: RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 4 and 3 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307.

I know Im close to solving this, so I would appreciate any input. Thanks in advance.

1 Like

You might have points that disappear due to data augmentation. Put remove_out=False in your transform call to avoid that, or reduce the data augmentation.

3 Likes

yep, that did the trick. For some reason I had to reverse the x, y to make sure the points were in the correct location.

def get_coords(fname): 
  fname = fname.split('/')[-1]
  coords = torch.FloatTensor(df.loc[df.image==fname,'coords'].values[0]).reshape(-1,2)
  return coords[:, [1,0]]

src = PointsItemList.from_df(df, cols='image', path=PATH, folder='Data_Training')
data = (src
        .random_split_by_pct()
        .label_from_func(get_coords)
        .transform(get_transforms(), tfm_y=True, remove_out=False, size=(120,160))
        .databunch().normalize(imagenet_stats)
       )
1 Like

Ah yes, i forgot to mention bit we follow numpy and pytorch convention of height then width.

2 Likes