Classification of an Image relative to other images in a group

Hi everyone, I am thinking how to model the following problem using fastai.
I want to train a classifier that given a subset of images is able to pick the image with the highest resolution relative to the other images in the subset.
My images are stored in a data-frame, one for each row, and each row specifies an Id to group the images in a specific group.
So I have groups with n images in it where n can vary, I want to be able to handle groups with different sizes.
The labels are binary, 1 for the image with the highest resolution in the group and 0 for all the other.
There is also a column that indicates me how to split my data in train and validation sets.

Id - ImageName - Label - is_valid
1, image_1.png, 1, False
1, image_2.png, 0, False
1, image_3.png, 0, False
1, image_4.png, 0, False
2, image_5.png, 1, False
2, image_6.png, 0, False
3, image_7.png, 0, True
3, image_8.png, 0, True
3, image_9.png, 1, True
3, image_10.png, 0, True
3, image_11.png, 0, True
3, image_12.png, 0, True

How would you model this with the DataBlock API?

Thanks in advance !!!

I would recommend checking this tutorial.

Thanks @VishnuSubramanian, I will for sure, for now I completed the part 1 of the course, but I haven’t found a way to model my specific problem yet, due to the fact that each subset can contain a different number of images, and the objective is an evaluation of an image just versus other images in the same subset.

I am not sure if the problem you mentioned can be considered a classification problem. I think you should take a look at Siamese architecture which helps in problems similar to yours. Where you are more interested in finding similar images. In your case, all the similar images should be of high resolution.

There is a tutorial here on Siamese architecture and data blocks required for that. Hope you find it useful.

@VishnuSubramanian I will consider that approach as well.
The tricky thing is that the resolution is within the group, so for example I can have a group with all very low quality images and I want to select the best of those and than I can have a group with all very high quality images, but as you can see the two selected images from these groups will be quite different, because the objective is resolution within the specific group.
I am not looking for a specific implementations so all your suggestions are very appreciated, thanks a lot.

Ok got it. I guess it’s a tricky problem. Maybe there exists a simple CV solution.

Sorry to bother you @jeremy, any suggestion where to look for ideas to solve my problem?

Hi @alberto93
at high level, to me this seems some sort of a ranking problem. Maybe you could frame it as a regression problem where you set up a model to predict a “resolution score” for each image (perhaps recursively as you have groups of different sizes?) and then you add a last layer that does the ordering/pick the max at the end of each group.

I wouldn’t know exactly how to do it in practice, but I would go for something along that line. Hope this might be useful :slight_smile:

@lclissa Intersting approach, but I would really like to stick to classification with binary labels if there is a feasible solution for it.

I’m sorry, indeed my comment was misleading. What I meant was to set up an architecture that somehow generates a resolution score for each image in a group (this is the regression part). For that I suspect you would need some sort of recurrent architecture as you are dealing with variable size groups. Then you could just add a “pick-the-max layer” and compare it with the binary label.

@lclissa yeah that could be an option.

I’ve solved a similar problem but used a Laplacian filter on my images to get a score of how blury each image is. Or do you mean resolution in number of pixels?
(opencv - Is there a way to detect if an image is blurry? - Stack Overflow)
This is just a standard filter with predefined values which is lightning fast on a gpu and you just select the one with highest value (shapest).
But you seem to be focusing on a classification approach, I’m curious why. Is it actually another problem you are trying to solve?
Here is another approach similar to the siamese twins architecture which generates an embedding for each of the two images and then take the two emebeddings to a final layers to compare they are equal or not.
One approach is to take tuple of x number of images into the network (maybe a maximum of 10?) and generate an embedding for each of the images, similar as the siamese twins network. The output is one-hot encoded for the index of the sharpest image. If you have less than 10 images, you just generate zeros to pad the array of embeddings for the final layers.

@dangraf Yeah I gave an example with images because it was the simplest example I could think of, but what I am looking for is a kind of general approach to this classification problem where each group has a different size and the labels are binary, I used an image as an example because the item itself in the group could be describe as a matrix.
So I think RNN are the direction I would take, but I am not sure if I can model my input as a list of matrices for each group and as output a one hot encoded vector with one for the item selected.
Do you think this is an appropriate dataset to pass to a datloader?

I think it’s possible. It’s done for sequence to sequence in text-translation. If I remember correct, they first sort the text sequences on length and then pad the sequences to have same length within a batch. They start with the longest sequences since it takes time for pytorch to allocate new memory during training.
You should look at these models because they have some logic inside to handle end of sequence etc.
I also like the RNN approach because it’s more elegant but was thinking that the siamese twins approach seems a bit simpler to start with just to make things working.
The dataloader should randomize the order within a group to make the “best” image appear on different places, so I believe you need to generate the one-hot encoded output on the fly similar to the Siamese twins model.

@ thanks I’ll start experimenting around :muscle:t2: