Ideas on document processing using deep learning


Any idea on how to implement deep learning model like ? I have tried object detections model on a few samples but the network does not seems to learn anything. I am trying to process invoice and extract a few key-valued pairs in it . Any idea or help is appreciated.

1 Like


HI Neeb hope you are well!

I was building a classifier which required a lot of text to be recognized. I found it rather difficult just using the classification techniques from lessons 1 and 2.

I am working on something else at the moment but will come back to it at some point, however in my travels, I discovered a library that would be ideal for extracting the characters from the images.

Then you could build your model on the text extracted.

Have a jolly day.
mrfabulous1 :smiley::smiley:

1 Like


Hi mrfabulous1, appreciated on your comments, I did build a template matching model using output from tesseract, but this method required a lot of rules and template pre-defined, which would not be a ideal case for me, since we have more than 100 different types of PDF, I am looking for a more generalised method to extract the text, but not sure the correct deep learning method to do this.


(Michael Benedict) #4

Hi Neeb, I’m working on a similar problem for a different type of document, and have been experimenting with the chargrid representation introduced by SAP Research last year. Here’s the paper (it’s been posted in a few other threads):

You may have seen that Rossum published a paper on the basics of their method, a different approach from SAP’s:

There may be heuristics, post processing, etc. that isn’t included in the papers but is important for making a production system work well. I think this type of domain-specific information extraction from documents is still an active area of research, and I don’t know of an open source tool that will do what you’re looking for end-to-end. I also don’t think there are open labeled datasets for invoice understanding, so training a DL model will involve some labeling drudgery.

That said, I’ve implemented part of the chargrid paper and have had some promising early results, so if you decide to work through it and have questions feel free to get in touch.



Hi @thebenedict, thanks for posting the link to both of the papers , I will start to look into them, infact I didn’t know that Rossum has published a paper, I am quite new to the CV community, can I know how you found the paper? Thanks for the help. :smiley:


(Michael Benedict) #6

@Neeb I find most papers in this space on, and Andrej Karpathy’s tool is useful for efficient searching.

1 Like


Wow thebenedict
Thats a lot of Computer Science papers!

Arxiv Sanity Preserver

Built in spare time by ]( to accelerate research.
Serving last 84779 papers from cs.[CV|CL|LG|AI|NE]/stat.ML

mrfabulous1 :smiley::smiley: