Classifying Images With Text

Hey! I’m trying to build a system to take images made up of primarily text and classify them into multiple categories. Specifically with restaurant menus. Looking through the forums I see it is recommended to look at previous ICDAR competitions to see how to do this as well as look at something like https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/ for the OCR portion. So do I do OCR first and then pass it in to a NLP text classification network? Is doing multi font real world OCR still a hard problem? I looked around ICDAR 2017, I didn’t see anything incredibly relevant, but it’s entirely possible that I just missed the appropriate papers/projects. How would y’all suggest I get started with this?

I would suggest using the latest (github master) version of tesseract

Thanks! Has there been significant changes lately that make master worth using? I assume it has something to do with the fact that the newest release is LSTM based?

Yes. The LSTM works quite well even for non-English languages.