Object identification from ImagePoints

Hey everyone,

I have been following the course “practical deep learning for coders” and had an idea for a regression network i wanted to make. I have previously applied models similar to the ones brought up in this course for building image classifiers which have been working great! However for this task i have run into some problems, and i hope someone can guide me on this.

As a part of a bigger goal i wanted to train a regression network to count cells which are lighting up on a dark background, similar to this picture Link.

I have created a training and validation set of images, annotated them similar to the lesson3 head-pose (except for these not having the depth sensor to worry about). However i am having problems since there in some pictures are more cells than others. When i run the code i get the following error:
RuntimeError: The size of tensor a (128) must match the size of tensor b (208) at non-singleton dimension 0
However if i redo the annotation and only mark 3 cells per image, so there is an even number of image points then everything runs smoothly. The problem is that i would need to identify the position for all cells in the image.
The thought occurred to me to use bounding boxes as examples with uneven number of those are no problem. The problem there is that some cells might be overly close and will not be easy or possible to label with rectangular boxes due to very uneven shape (Think neurons).

What i would like to ask you is weather there is a way to make multipoint identification (i run by PointsItemList) to work with an uneven number of objects in the images or should i look into another approach.

Best regards

1 Like

Hi @AlexSvan,

Curious to know if you made any progress! I don’t have an answer, but I’m working on a similar idea and I found the introduction and related work sections of this paper helpful:

Where are the Blobs:
Counting by Localization with Point Supervision

It seems possible to use regression on point annotations for counting, but segmentation or bounding box annotations are easier to work with – and much harder to get. Fixed camera angles and similar object sizes seem to help. Papers like this one might also be useful.

I might try padding the arrays to the length of the largest one in my data set with a dummy value, and then do the regression as in lesson 3. I don’t know if that’s a good idea. [Edit: I see you tried this later!]