I would like to make a program for automated invoice processing. Could semantic segmentation be useful for such project? I am thinking about making segmentation masks that describe the invoice format, e.g. header, table, footer, supplier, etc. After that, OCR and RNN could be used on image segments.
Can I rely on unet learner to recognize a document layout? I am not sure how much it can learn from text images? Do you have some other approaches on mind?
You can also treat it as object detection and just get the bounding boxes for header, tables, footer and such. Not sure what is better though ^^. I think you can find more info looking up document semantic structure extraction.