What model to choose for multiple handwritten digit recognition

The task is to detect digits in the grid in an image .

What is the best model to use in this case?
I found that PP-OCR is doing very well, but I think this may be overkill for my task.

Thank you!

If you can ensure that the grid is always in the same place each time, you could just use a library like OpenCV to slice the grid into individual images and then have one classifier that classifies the numbers.

Hi @birosjh, thank you for your answer.

No, in test images the grid is not always present.
I tried approach mentioned by you + trying to restore the grid and applying different transformation, but the result does not seem to be good.

Ah so the position of the numbers can move around quite a bit then? If splitting them into individual images isn’t possible, then I think that your idea to look for pretrained models is a good idea. PP-OCR seems like it would be fine to use, but you might check if it needs a GPU or not for inference. This project also has pretrained models and looks like it might work pretty well https://github.com/githubharald/SimpleHTR

Some preprocessing + densenet121 has done very good work (resnet34 is aplicable too, but has 2-3% lower accuracy)

  1. Find and remove horizontal and vertical lines + remove noises using Gaussian kernels.
  2. Join close contours.
  3. Find 4 biggest contours. Remove contours with small area/width/height (not all items have exactly 4 digits, maybe less).
  4. Extract digits.
  5. Use CNN.

Glad you found out a way to tackle it. I hadn’t thought of finding the 4 biggest contours. Awesome idea!