Segmentic segmentation of document

I would like to make a program for automated invoice processing. Could semantic segmentation be useful for such project? I am thinking about making segmentation masks that describe the invoice format, e.g. header, table, footer, supplier, etc. After that, OCR and RNN could be used on image segments.

Can I rely on unet learner to recognize a document layout? I am not sure how much it can learn from text images? Do you have some other approaches on mind?

Thanks in advance.

You can also treat it as object detection and just get the bounding boxes for header, tables, footer and such. Not sure what is better though ^^. I think you can find more info looking up document semantic structure extraction.

you could use either object detection (YOLO) or semantic segmentation (UNET). If the objects are all squares / rectangles try YOLO first. I know that both these techniques works on document layouts.

You could also refer to https://arxiv.org/abs/1804.10371

1 Like