Pattern Recognition in images


I am a beginner in the field of deep learning and am trying to detect if a teacher’s slide has a multiple-choice-question on it.

For the training data that I have, there are slides (in image form), which either have typed questions or slides where teachers have scanned questions from some book and put that image in the slide.

Basically, the model needs to detect a pattern where there is some text followed by 2-4 bullet points.

Some examples are as follows:

I tried the resnet model on this dataset and while it worked well on the training and validation dataset, when I fed it randomly picked images of MCQs from the internet, it failed miserably. So I am not sure what the model was detecting really. Was it able to understand that it needs to look for a pattern.

I want to know if resnet is the right model to use for this problem or if there are specialized models for such problems (like YOLO is for object detection). If resnet should work for this, then what kind of tricks can I use to guide it to the correct solution.


Hey, I will try to help but I am not very experienced too :slight_smile:

You don’t say what are the sizes of your training and validation sets. Maybe you are not training it with enough data.
Other point: Maybe your test set is not like your training set and your validation set, and then you can’t have good results.
So just have a look in your data (training, valid, test) to see if your data is ok

For that, maybe classifying the whole image is OK, but you can try to see some patterns with object detection like the points or a) b) c) … Have a try…

How about you train YOLO to find a), b), c) ,d). If it finds them, then the image has multiple choice questions.

Thanks for your replies.

The problem with trying to detect a), b), c), d) is that the number of options may vary, the options might be A), B), C), D) or 1, 2, 3, 4 or i, ii, iii, iv. They might not have ) at the right, etc.

Sure, all of the above can be handled by writing an algorithm on top of the results from YOLO and I would definitely give that a shot. I was just wondering if there may be a better model to use for this problem.

Right now I have 3000 images to train on and I can get more images. My test set was slightly different, but not by much, than my training set. And I need the model to work on slides from new teachers.

The training accuracy was 98% when validation set was chosen randomly from the training set. It went down to 75% when I removed slides from particular educators from training set and moved them to validation set. Even though that looks promising, I am not sure what the model is really detecting and hence don’t have confidence on it.

I think it will not be a problem. You should aim to train your model to detect these small option values. Maybe annotate like 100 images and see if you can you get YOLO working on it. Try to overfit on those 100 images.

The idea is these options generally have some whitespace around it which will tell YOLO to detect these options.

Hey @vaibhav-sinha!
Have you thought about using an OCR model in combination with NLP?

That is exactly what we are trying right now. We are attempting two things:

  1. Use NLP to figure out if the text is a question
  2. Use simple regex to find if there are choices

If both of them say yes, then it is a multiple choice question.

1 Like