I am trying to extract all the text from an image (e.g., medicine packaging).
I used a CRNN + BiLSTM + CTC model trained on 100 images, but the validation accuracy was poor.
What model architecture would be better suited for this?
.
Any guidance or suggestions will be helpful.