Low accuracy using CRNN+BiLSTM+CTC for OCR — what should I try next?

I am trying to extract all the text from an image (e.g., medicine packaging).
I used a CRNN + BiLSTM + CTC model trained on 100 images, but the validation accuracy was poor.

What model architecture would be better suited for this?
.

Any guidance or suggestions will be helpful.

1 Like

Maybe 100 images are not sufficient for your model to get a decent accuracy :thinking:

1 Like

Hi vani_amudan,

Your dataset of 100 images is likely too small for a CRNN + BiLSTM + CTC model. A few suggestions:

  1. Data Augmentation: Rotate, scale, add noise or brightness changes to expand your dataset.
  2. Pretrained OCR Models: Try Tesseract, PaddleOCR, or EasyOCR and fine-tune them.
  3. Transformer-based OCR: Models like TrOCR or Donut work well with small datasets.
  4. Simpler Architectures: Sometimes CNN + CTC alone performs better on limited data.

Scaling up your dataset, even with synthetic images of your packaging, usually helps the most.