Ideas on training a neural network to read data from structured documents?

sandskies · October 2, 2017, 4:08pm

I am currently doing Part 2 of the course and so far it’s been a wonderful learning for me. I wouldn’t have been able to appreciate the power of deep learning in any better way other than fast.ai. I have solved the fish problem and lung cancer with CNNs and wondering if similar techniques can be used for document images or document pdfs.

I want to build a neural network based model to read data from bank statement images/pdfs provided by our customers. I want to be able to read the data and input into our database. How can I do this?

Currently, I use open source tools to convert pdf into text and then have written code to put understand that and write into our database. But its proportional to the number of banks we want to support and a number of changes banks make to their pdf.