I am trying to extract all the text from an image (e.g., medicine packaging).
I used a CRNN + BiLSTM + CTC model trained on 100 images, but the validation accuracy was poor.
What model architecture would be better suited for this?
.
Any guidance or suggestions will be helpful.
1 Like
Maybe 100 images are not sufficient for your model to get a decent accuracy
1 Like
Paulfast
(Paul)
October 8, 2025, 6:30pm
5
Hi vani_amudan,
Your dataset of 100 images is likely too small for a CRNN + BiLSTM + CTC model. A few suggestions:
Data Augmentation: Rotate, scale, add noise or brightness changes to expand your dataset.
Pretrained OCR Models: Try Tesseract, PaddleOCR, or EasyOCR and fine-tune them.
Transformer-based OCR: Models like TrOCR or Donut work well with small datasets.
Simpler Architectures: Sometimes CNN + CTC alone performs better on limited data.
Scaling up your dataset, even with synthetic images of your packaging, usually helps the most.