Pattern Recognition in images

vaibhav-sinha · April 16, 2019, 3:29pm

Hi

I am a beginner in the field of deep learning and am trying to detect if a teacher’s slide has a multiple-choice-question on it.

For the training data that I have, there are slides (in image form), which either have typed questions or slides where teachers have scanned questions from some book and put that image in the slide.

Basically, the model needs to detect a pattern where there is some text followed by 2-4 bullet points.

Some examples are as follows:

I tried the resnet model on this dataset and while it worked well on the training and validation dataset, when I fed it randomly picked images of MCQs from the internet, it failed miserably. So I am not sure what the model was detecting really. Was it able to understand that it needs to look for a pattern.

I want to know if resnet is the right model to use for this problem or if there are specialized models for such problems (like YOLO is for object detection). If resnet should work for this, then what kind of tricks can I use to guide it to the correct solution.

Thanks

Matthieu · April 16, 2019, 6:25pm

Hey, I will try to help but I am not very experienced too

You don’t say what are the sizes of your training and validation sets. Maybe you are not training it with enough data.
Other point: Maybe your test set is not like your training set and your validation set, and then you can’t have good results.
So just have a look in your data (training, valid, test) to see if your data is ok

Matthieu · April 16, 2019, 6:28pm

For that, maybe classifying the whole image is OK, but you can try to see some patterns with object detection like the points or a) b) c) … Have a try…

kushaj · April 16, 2019, 9:30pm

How about you train YOLO to find a), b), c) ,d). If it finds them, then the image has multiple choice questions.

vaibhav-sinha · April 17, 2019, 3:31am

Thanks for your replies.

The problem with trying to detect a), b), c), d) is that the number of options may vary, the options might be A), B), C), D) or 1, 2, 3, 4 or i, ii, iii, iv. They might not have ) at the right, etc.

Sure, all of the above can be handled by writing an algorithm on top of the results from YOLO and I would definitely give that a shot. I was just wondering if there may be a better model to use for this problem.

vaibhav-sinha · April 17, 2019, 3:36am

Right now I have 3000 images to train on and I can get more images. My test set was slightly different, but not by much, than my training set. And I need the model to work on slides from new teachers.

The training accuracy was 98% when validation set was chosen randomly from the training set. It went down to 75% when I removed slides from particular educators from training set and moved them to validation set. Even though that looks promising, I am not sure what the model is really detecting and hence don’t have confidence on it.

kushaj · April 17, 2019, 11:50am

I think it will not be a problem. You should aim to train your model to detect these small option values. Maybe annotate like 100 images and see if you can you get YOLO working on it. Try to overfit on those 100 images.

The idea is these options generally have some whitespace around it which will tell YOLO to detect these options.

kai · April 19, 2019, 11:50am

Hey @vaibhav-sinha!
Have you thought about using an OCR model in combination with NLP?

vaibhav-sinha · April 19, 2019, 2:14pm

That is exactly what we are trying right now. We are attempting two things:

Use NLP to figure out if the text is a question
Use simple regex to find if there are choices

If both of them say yes, then it is a multiple choice question.