Model for document images

Is there a deep learning model that has been trained on text and document images? Like ImageNet for text? I would like to experiment with projects that do OCR.

I’m not sure if it uses DL, but there’s https://github.com/tesseract-ocr/tesseract . They also provide training data. You could use that to train your own model - would be a cool project :slight_smile:

Thanks for the reply. Tesseract doesn’t use DL, I would be interested in a DL solution.

1 Like

Just use their training data with a CNN?

I wouldn’t have a clue on how to do that… I’m currently on Lesson 4.
I find it surprising noone has done this though.

I hope by the end of the course you’ll feel able to tackle this. In the meantime, there are projects such as https://github.com/pannous/tensorflow-ocr.

And for more reading: https://github.com/hs105/Deep-Learning-for-OCR

h/t @bckenstler I just learnt that there’s a new ‘Tesseract LSTM’ that uses deep learning and is much more accurate :slight_smile:

1 Like

Yes! It’s performs way better than the current official release. Unfortunately, since it’s an alpha build you must build it from source, it can be a pain to compile.

You can find the full guide here https://github.com/tesseract-ocr/tesseract/wiki/Compiling. I highly recommend you follow along to the two videos linked on that page!

1 Like

There is another OCR tool called https://github.com/tmbdev/ocropy, though I don’t have a sense as of now on which is better.

If you want to be more in control of the details, first read this.

https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/

They give a lot of detail on architectures that worked for them. Here is a Keras implementation

I’m working on an OCR project myself and am only in the research phase. If you make any progress, I’d love to learn from you (and I’d be equally happy to share my notes).

Bump!

I’m working on an ocr project as well. I can recognise words pretty well now using CNN+LSTM+CTC however I would like to segment lines of text now. I’ve been trying to implement this method in the following research paper https://arxiv.org/pdf/1704.08628.pdf (Without the 2D LSTM to start), however I am struggling to even get the basic network working as its a FCN I’m not sure how to structure the output in keras/tensorflow. Is anyone able to help ?