Information extraction OCR for receipt

@phucnsp, yes i think you should definitely do it .
You’ll find some insiration here i think:

or

I think you you will have the kind of information you need. :slight_smile:

2 Likes

We are doing pretty the same. Currently we extracting text with traditional (Tesseract) OCR and doing entity extraction on text data. In research state is end-to-end entity extraction from image directly

Hi @veon,
end2end entity extraction, you mean kind of CNN + LSTM?

yes, too hard for labelling data with Chargrid 2D.

I’m thinking about transformer network of some kind. Papers look promising, but most of them trying to spot large text on small images.

i am working on text detection on aadhar card (it is an unique identification card given in india) and we are only extracting name, dob and address which is then passed to tesseract , i would like to know which models works well with detection of text in card and the problem is the card contains two languages and i am only concerned about English. some suggestions would be grateful !!

Hello @phucnsp,

I am working on a similar problem and was looking for an efficient and accurate way to extract text from receipts in English. My data consists of images of printed receipts such as shown below.

As you have tried a lot of techniques, It would help me a lot if you could share some of your insights on an approach for fast and accurate extraction of all the text on the image.

Thank you,

you can refer in the result of SROIE competition, task 3. Many teams have reported their method for receipt information extraction
https://rrc.cvc.uab.es/?ch=13&com=evaluation&task=3

@Nischal I am working on exactly similar problem.Can you guide me on how to approach this problem?

hi can u help me to find the training data (text files and json files) for the SROIE 2019 competition. i downloaded the data but not able to find the training data for task 3 challenge.

You can go here, register and download the data from gdrive or baidu cloud.

@MichaelScofield I have downloaded it but not able to find training data for task 3.
if u have send me the training data for task 3. it would be very helpful

guys, I’m new.
I have a question, is there anyway we can combine the result of different models? like ensemble for text localization ?

What worked best for you when you tried key info extraction?

Hi you get any solution for issue Japanese receipt