YOLO model underfitting

gokul · August 5, 2019, 9:57am

Hi Guys,

I am trying to use darkflow for custom object detections.
Darkflow is an implementation of YOLO architecture for object detection(https://github.com/thtrieu/darkflow). I am trying to manually annotate data and use the darkflow framework to train on it.

The objective is to extract some annotated fields from documents.
My dataset is also quite small, with 58 annotated images.

I was expecting that the darkflow framework will overfit since the dataset is small and the model is sufficiently big. But no matter how much I train the network, the loss kind of plateaus. And the predictions of the model are also way off from what is desired.

I have tried image augmentation, varying the learning rates, trying new architecture(using ResNet50 as feature detector and training a classifier on top of that). But same results.

I am not sure why a complex model will underfit a small training set?

Grateful for any suggestions about how to proceed!

Cheers,
Gokul

fabris · August 5, 2019, 2:25pm

First, clarifying what do you mean for custom object detections?
“The objective is to extract some annotated fields from documents”

is quite confusing.

Finally, 58 annotated images are almost nothing to train a model from scratch.

A simple rule of thumb: at least 100 images per class once loaded a pre-trained model, that is transfer learning approach.

gokul · August 6, 2019, 2:25am

Thanks a lot Fabrizio.

“The objective is to extract some annotated fields from documents” – The document is an image of an ID card(driving license for example). I have drawn bounding boxes on only one field (Date of birth). I have 58 such identification documents with bounding box for only one class(Date of birth, DOB). And I am trying to draw the bounding box over DOB fields for new documents.

I am actually loading pertained YOLO weights and trying to fine tune them for this problem.

I would actually expect it to overfit. But am quite puzzled why its not overfitting?

fabris · August 6, 2019, 9:52am

My first hypothesis is that the pre-trained model’s objective(read target classes) is “far” from yours: so the model cannot discriminate over the signal you are using.

You have to rethink your approach: YOLO, and other DL approaches are good when have a lot of variety in your samples. Since you only look at ID card, look for text recognition task.

gokul · August 7, 2019, 3:06am

Thanks Fabrizio.

Text recognition will get me all the text fields, but will not be able to label them and hence this annotation approach.

The good news is I got it to overfit, albeit with a couple of tweaks. Working on improving it further. Thanks again for your suggestions

Cheers,
Gokul

fabris · August 7, 2019, 8:56am

Good!! But curious about the tweaks you used, would you like to share? Oversampling ??

gokul · August 8, 2019, 1:25am

Well for one I moved away from YOLO and used ResNet-56 as the feature detector and then trained some layers on top of it. And improved it by adding one batch-norm layer.

Also initially target labels were the lop left and bottom right corners of the bounding box, and I changed it to center, width and height. And finally I applied different learning rates across different layers.

Planning to try oversampling today. Please feel free to suggest more improvements that you think will help. Thanks!

Cheers,
Gokul

akarshs · September 24, 2019, 12:19pm

hi @gokul , i am also working on similar problem , here i am trying to extract text from aadhar card and i only need name ,dob and address and i was thinking about using ssd . and could u please share your code and explain me about the difference in change the bounding box form top left and bottom right approach to center , width and height

gokul · October 1, 2019, 8:31am

Hey Akarsh,

I was initially using darkflow for predicting the bounding boxes. Then I kind of tried a simpler approach, which was to just take the ResNet stem as a feature detector and add some layers on top of that to predict 4 numbers for each of the images.
One thing that helped me was to start with just one class, for example just the Adhar number.
The core of the code was something like this. With some modifications to the final layers. (Credits: [Programming PyTorch for Deep Learning, by Ian Pointer])

from torchvision import models
transfer_model = models.ResNet50(pretrained=True)
for name, param in transfer_model.named_parameters():
    param.requires_grad = False
transfer_model.fc = nn.Sequential(nn.Linear(transfer_model.fc.in_features,500),
nn.ReLU(),
nn.Dropout(), nn.Linear(500,2))

Will try to add the code soon, but hope this helps you to get started!

Cheers,
Gokul