Image of Text Document Classification

zoumana · March 8, 2020, 5:30pm

Hello Everyone,

I want to classify three types of documents : invoices, identity cards, Taxes. They are all in image format, containing text. I want to use pretrained CNN to solve the problem. But Since all these documents contain textual information, I am thinking about combining NLP and CNN to do the classification task.
What approache would you recommend ?
Thank you in advance for your feedback.

Best

juvian · March 8, 2020, 5:45pm

The only way I would see NLP useful here is if you extract all text and want to classify that as invoice/identity card/tax. But you can probably classify those just by the format of things in the image, don’t think you need anything else other than classifying the image

zoumana · March 8, 2020, 6:03pm

Totally agree @juvian, and thank you for your feedback!