Information extraction OCR for receipt

Neurosci · June 21, 2019, 6:18am

@phucnsp, yes i think you should definitely do it .
You’ll find some insiration here i think:

or

I think you you will have the kind of information you need.

veon · June 29, 2019, 2:36pm

We are doing pretty the same. Currently we extracting text with traditional (Tesseract) OCR and doing entity extraction on text data. In research state is end-to-end entity extraction from image directly

phucnsp · July 1, 2019, 8:01pm

Hi @veon,
end2end entity extraction, you mean kind of CNN + LSTM?

phucnsp · July 1, 2019, 8:03pm

yes, too hard for labelling data with Chargrid 2D.

veon · July 2, 2019, 9:17pm

I’m thinking about transformer network of some kind. Papers look promising, but most of them trying to spot large text on small images.

akarshs · September 24, 2019, 11:57am

i am working on text detection on aadhar card (it is an unique identification card given in india) and we are only extracting name, dob and address which is then passed to tesseract , i would like to know which models works well with detection of text in card and the problem is the card contains two languages and i am only concerned about English. some suggestions would be grateful !!

Nischal · November 13, 2019, 1:37am

Hello @phucnsp,

I am working on a similar problem and was looking for an efficient and accurate way to extract text from receipts in English. My data consists of images of printed receipts such as shown below.

As you have tried a lot of techniques, It would help me a lot if you could share some of your insights on an approach for fast and accurate extraction of all the text on the image.

Thank you,

phucnsp · November 13, 2019, 2:32am

you can refer in the result of SROIE competition, task 3. Many teams have reported their method for receipt information extraction
https://rrc.cvc.uab.es/?ch=13&com=evaluation&task=3

vishal7 · November 18, 2019, 3:41pm

@Nischal I am working on exactly similar problem.Can you guide me on how to approach this problem?

amir.ai · November 24, 2019, 10:31am

hi can u help me to find the training data (text files and json files) for the SROIE 2019 competition. i downloaded the data but not able to find the training data for task 3 challenge.

MichaelScofield · November 24, 2019, 1:24pm

You can go here, register and download the data from gdrive or baidu cloud.

amir.ai · November 25, 2019, 7:18pm

@MichaelScofield I have downloaded it but not able to find training data for task 3.
if u have send me the training data for task 3. it would be very helpful

chuonghuy · September 25, 2020, 3:43am

guys, I’m new.
I have a question, is there anyway we can combine the result of different models? like ensemble for text localization ?

ninjakx · December 18, 2020, 9:11am

What worked best for you when you tried key info extraction?

cgowthamanmca · December 2, 2021, 3:59am

Hi you get any solution for issue Japanese receipt