Model for document images

vdp · May 10, 2017, 1:55pm

Is there a deep learning model that has been trained on text and document images? Like ImageNet for text? I would like to experiment with projects that do OCR.

jeremy · May 10, 2017, 5:32pm

I’m not sure if it uses DL, but there’s https://github.com/tesseract-ocr/tesseract . They also provide training data. You could use that to train your own model - would be a cool project

vdp · May 10, 2017, 11:06pm

Thanks for the reply. Tesseract doesn’t use DL, I would be interested in a DL solution.

jeremy · May 11, 2017, 3:40am

Just use their training data with a CNN?

vdp · May 11, 2017, 12:34pm

I wouldn’t have a clue on how to do that… I’m currently on Lesson 4.
I find it surprising noone has done this though.

jeremy · May 12, 2017, 10:00pm

I hope by the end of the course you’ll feel able to tackle this. In the meantime, there are projects such as https://github.com/pannous/tensorflow-ocr.

And for more reading: https://github.com/hs105/Deep-Learning-for-OCR

jeremy · May 12, 2017, 10:33pm

h/t @bckenstler I just learnt that there’s a new ‘Tesseract LSTM’ that uses deep learning and is much more accurate

bckenstler · May 12, 2017, 10:51pm

Yes! It’s performs way better than the current official release. Unfortunately, since it’s an alpha build you must build it from source, it can be a pain to compile.

You can find the full guide here https://github.com/tesseract-ocr/tesseract/wiki/Compiling. I highly recommend you follow along to the two videos linked on that page!

briandalessandro · July 15, 2017, 12:46pm

There is another OCR tool called https://github.com/tmbdev/ocropy, though I don’t have a sense as of now on which is better.

If you want to be more in control of the details, first read this.

https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/

They give a lot of detail on architectures that worked for them. Here is a Keras implementation

github.com

keras-team/keras/blob/master/examples/image_ocr.py

# -*- coding: utf-8 -*-
'''This example uses a convolutional stack followed by a recurrent stack
and a CTC logloss function to perform optical character recognition
of generated text images. I have no evidence of whether it actually
learns general shapes of text, or just is able to recognize all
the different fonts thrown at it...the purpose is more to demonstrate CTC
inside of Keras.  Note that the font list may need to be updated
for the particular OS in use.

This starts off with 4 letter words.  For the first 12 epochs, the
difficulty is gradually increased using the TextImageGenerator class
which is both a generator class for test/train data and a Keras
callback class. After 20 epochs, longer sequences are thrown at it
by recompiling the model to handle a wider image and rebuilding
the word list to include two words separated by a space.

The table below shows normalized edit distance values. Theano uses
a slightly different CTC implementation, hence the different results.

            Norm. ED

This file has been truncated. show original

I’m working on an OCR project myself and am only in the research phase. If you make any progress, I’d love to learn from you (and I’d be equally happy to share my notes).

vdp · November 3, 2017, 5:33pm

Bump!

maith · February 27, 2018, 4:39pm

I’m working on an ocr project as well. I can recognise words pretty well now using CNN+LSTM+CTC however I would like to segment lines of text now. I’ve been trying to implement this method in the following research paper https://arxiv.org/pdf/1704.08628.pdf (Without the 2D LSTM to start), however I am struggling to even get the basic network working as its a FCN I’m not sure how to structure the output in keras/tensorflow. Is anyone able to help ?