DataBunch error

I’m working through the lesson 3 head pose notebook and trying to apply this lesson to detecting cars on the Lawrence Livermore National Labs ( dataset. I currently have the data residing in a directory which includes jpegs (256x256) with an associated text file that has the list of coordinates for a point on the center of the car. The following code, opens the text file, converts the coordinates to the image space, and then displays the image with the points on the cars:

    ##Get text file associated with image##
    def img2txt_name(f): 
        return path/f'{str(f)[:-4]}.txt'

    ##Function to convert and extract coordinates of groundtruth (labels)##
    def convert_xy(df):
        df['x_coord_conv'] = (df['x_coord']*256)
        df['y_coord_conv'] = (df['y_coord']*256)
        df = tensor(df['y_coord_conv'], df['x_coord_conv'])
        return df

    ## return only the coordinate labels for the image ##
    def get_car_features(f):
        data = np.genfromtxt(img2txt_name(f))
        data = pd.DataFrame(data)

        if data.shape[1] == 1:
            data = data.T
            coords_df = convert_xy(data)
            return coords_df.T
            coords_df = convert_xy(data)
            return coords_df.T

    ctr = get_car_features(fname), ctr), figsize=(10, 10))

This produces the right output:

However I am running into issues with the data block api. Here is the code snippet:

    data = (PointsItemList.from_folder(path)
            .transform(get_transforms(), tfm_y=True, size=(256,256))

Just like in the head pose notebook, I am using PointItemList and point to the folder with all the images & text files. I want to split by random percent and then use the function from above to open the text files and get the coordinates (labels in this case). The issue I am having is that the data block is opening both the jpegs and the text files which creates torch size mismatches - when I want it to create the data bunch with the images, and get the labels from the text files separately. What can I do to correct this issue? Should I move all text files to a different directory?

Image showing the output issue below (you can tell the images from the txt files in the torch size):

The current directory looks like this:

image1.txt (structured as follows) <class, x_coord, y_coord, length, width>

Thanks in advance!

1 Like

You can pass in a parameter to include file of only certain extensions like so :
PointsItemList.from_folder(path, extensions=['.jpg'])

A bit more information on this:

As per the source code, the PointsItemList class subclasses the ImageList class which has the actual from_df function that returns an ItemList consisting of Images with their label class set to PointsLabelList (courtesy of the PointsItemList class).
This is the source code for the from_df function of the ImageList class:

    def from_folder(cls, path:PathOrStr='.', extensions:Collection[str]=None, **kwargs)->ItemList:
        "Get the list of files in `path` that have an image suffix. `recurse` determines if we search subfolders."
        extensions = ifnone(extensions, image_extensions)
        return super().from_folder(path=path, extensions=extensions, **kwargs)

That’s where you see the extensions parameter which allows you to filter out unneeded files.
Hope this helps!


I realize the extra information may have been hard to parse if one isn’t familiar with Object Oriented Concepts in general and for Python specifically. I’d be happy to elaborate on that if you think there’s a need for it :slight_smile:

@akashpalrecha thank you for the response and I appreciate the extra information. I added the extensions to parameter but it still produces the same error as in the image above.

UserWarning: It's not possible to collate samples of your dataset together in a batch.
Shapes of the inputs/targets:

Am I diagnosing it incorrectly? Is the data block reading the images in and the labels in separately but since it’s a multi-label problem (i.e. there could be anywhere from 0 - n cars in a given image or multiple points in an image) that it’s causing the issue?

Thanks again,

Since you already have a size parameter in your transformations for the data, I am guessing that this issue is almost definitely from your labels being of variable length.
Now, I am not aware of a CNN architecture that gives you a variable-length output at least yet (please correct me if I’m wrong, I’d love to know if there’s a paper about it)

So maybe you could model your problem as a segmentation task? You could create segmentation masks for each input image with masks containing white spots of fixed radius wherever there is a car. This would work in theory, and all your issues with the point labels will be obviated.

Points to note:

  1. Since segmentation maps output by models sometimes contain noise in the form of small blobs of white pixels, this can wreak havoc for your task as in this particular case, these white blobs would mean that there is a car in that spot. They can’t be ignored.
  2. To deal with this, you’ll have to increase the radius of the points to a substantial number when making the masks so that they can be differentiated from the noise when the model outputs the same.
  3. You’d have to figure out a good way to post-process these mask outputs so as to get your final coordinates.

All of this seems hacky. I’m pretty sure that there would be a better way to do this. If someone else could budge in and help out, that would be great!

1 Like

@akashpalrecha Thank you again for your response. I do believe this probably is the issue, the labels being of variable length. I’m sure there must be a way to handle this specific situation. I hate to ask @sgugger but is there no way to have the data block to interact with multiple points in in and image (or labels with variable lengths)?

You have to write your own padding function (like bb+pad+collate for instance) and pass it to your call to DataBunch.

The bigger issue here is that even if your data bunch is able to process variable-length labels, you have to have a model architecture that is able to output such a label. This problem needs to be solved first. And right now the only solution I can think of is re-modelling your task as a segmentation problem.

Thank you for your inputs. The problem is basically an object detection type of model, its just using central points over the feature as the labels as opposed to the bounding box. In this case, wouldn’t a resnet model architecture be sufficient? Would either of you recommend moving forward in with configuring the bounding boxes or should I continue on the path I am taking and figure out my own collate function? I found a few examples where individuals had to modify the collate function in these two links (no label modification, and removing samples where points are outside of image after augmentation.). I’m assuming I need to modify the collate function to read in to the length of points in a given image (could be 0 to n). Any guidance on how to accomplish this might be helpful.


1 Like