Hi. I am new to the community, so I apologize for any errors that I make the first time. Also, sorry if this was answered somewhere else, but from looking at the forums, I was not able to find what I needed.
I trying to rewrite a deep learning model that can predict captchas using fastai API. The data that I’m using is this which consists of around 10k four-letter long captcha images. The label of each image can be deduced from its name, e.g. “ABCD.png”. I have successfully managed to do this without fastai first and my idea was the following:
- Make a PyTorch dataset that returns an image and a LongTensor of length four that represents the encoded label of each position, e.g. if I had an image “ABCD.png”, the y tensor that I would return would be [10 11 12 13] since in my case 10-13 turned out to be the corresponding encodings for letters A-D.
- Use a standard resnet model, but change its final layers to output a 4*36 (4 positions times 26 letters plus 10 numbers) length vector. The crucial step is that then I reshape this into a 36x4 tensor and use nn.CrossEntropyLoss(reduction=“sum”) loss, the effect of which is to add up the cross-entropies for each of the four outputs.
- This way, my model learned to classify the letter at each position of the captcha separately.
- Note that some other metrics that I tracked was letter accuracy (checking if the model gets separate letters right) and overall accuracy (checking how many samples there are where each letter is predicted correctly).
I am now trying to think to rebuild this using fastai API. My thinking was:
- I can just use the same loss function by doing
learner.loss_func = nn.CrossEntropyLoss(reduction="sum")
- I can still have the custom head for my neural net that output the 36 by 4 tensor
- I know how to write a custom metric to track the model’s performance
However, I am struggling to figure out how to use the fastai dataloaders to pass the data in the same way as I did before. I tried looking at documentation for DataBunch and TransformBlock, but was unable to figure how to to get it to return the length four tensor with encoded labels for each position.
Any pointers appreciated, as well as comments on my approach overall. Let me now if there is some extra information that you need.