What are good problems to solve? Can I count things in an image?

I’m trying to connect with some marine research labs here in California to identify problems that we can collaborate on. I want some real-world problems to practice, and would like to contribute to the scientific field in doing so.

I have a friend who will post a “call for collaborators” email on their California-wide email list. She asked me to specify what types of problems would be good for a deep learning application. The issue is that I’m not sufficiently experienced in our field to know with specificity, beyond a generic “classification” problem description.

From speaking with her, I know grad students spend a LOT of time counting how many particulates or life form are in a given image. Would it be ill-advised to approach the “counting things” problem as a classification problem?

Any other tips or approaches for identifying good problems to tackle with DL, or ways I can count things in an image, would be much appreciated.


I think this should work fine, although I haven’t tried it myself. I’ve seen ‘visual question answering’ examples that do this correctly with CNNs.

Freely available image processing programs like ImageJ / Fiji have libraries for counting cells/ nodules in a image, topology based segmentation… For segmentation probably is better if you can develop a custom NN but for just “counting things” those macros this programs have can do the job. As long as those “things” are similar in shape/size…

1 Like

The ImageJ stuff makes lots of assumptions because it’s a heuristic method. If you’re counting, say, faces in a photo, you’ll need a CNN! :smiley:

1 Like

@jeremy what do you think of mahotas? Just in terms of using it for preprocessing tasks which could be used for the task of preparation of data for a CNN to classify.

I think OpenCV is great and works well for a lot of use-cases but I’m wondering if you could recommend some other python libraries/packages that might also be useful for pre-processing?

I only briefly looked at it, but it doesn’t seem that relevant to CNN preprocessing.

Some questions you should ask:
Are ‘things’ in your image homogenous? (i.e. same / similar thing many times)
Or are there different ‘things’? If different type of things then do you want to know total count or count by each class?
Is partial or large occlusion possible or occlusion is eliminated because of your domain?
What is the scale of objects in your images - is there a mix of big and small objects?
How accurate does the count need to be - 100%? 90%?

I had looked into the counting problem for counting bottles on a shelf. I generated sample images of different types of bottles in a row (one behind another with partial occlusion). With such generated images it was relatively straightforward to train a network which could count the number of bottles in a line (from 1 to 7). But applying to real work images from shelf ran into problems because now there were multiple rows of bottles one next to each other and segmenting each row itself turned out to be hard because of image/video perspective.

When I applied pretrained object detection networks such as SSD, YOLO on these images they recognized some of the bottles but not all - so one could not use them directly to do counting.


They’re pictures of slides or other (3d) containers with biological items in them, so items are not homogenous. Can have different items in the image, but only want to count the item class that the researcher is interested in.

In the 3d images (i.e. not prepared slides), the orientation and lighting can vary. I suspect this makes it a much harder problem, but if I could get within 20% for a first effort, I’d be v happy.

While I’m waiting on the image data, I am going to work on the counting problem, without some of the real world challenges like occlusion, etc. Simplify the problem space to evaluate feasibility and all that.

I will keep the class posted if I make any progress on this problem, or if I hear of other related projects.

@sanjeev.b the inputs may be too different for your use case, but I have seen several papers in the literature getting >90% accuracy for object counts after training the model on generated images for the training set. Which is so cool!

This paper might be of interest: https://arxiv.org/abs/1703.08710


Just remembered, there was also this Kaggle competition