So I’m trying to train a model to interpret CAPTCHA codes, and this is my first attempt at working with the lessons taught in lecture 1. I managed to set up the ImageDataBunch and the learner successfully, but as I try training the data, it just shows #na# in most of the read-out columns. (such as error_rate/accuracy)
I capping the max learning rate, and that caused the “train loss” to actually increase.
Here’s the learning rate curve:
I would appreciate any help on this one. Thanks!
I’m also having this issue… Would love to have some input on this…
how did you split the data? do you actually have a validation set?
Hey! So my data set was just a set of pictures in a single folder. So I gave the ‘valid_pct’ a value of 0.2 in the ImageDataBunch function pass. That’s about all I’ve done. Should I be doing anything else?
It looks like you are trying to treat every CAPTCHA as a different category. I don’t see how that could work. The network doesn’t care about your category names (they are converted to sequential numbers internally). So how could it ever learn to categorise a CAPTCHA it hasn’t seen?
This also means that when validating it has different categories toi those in the training set, hence the warning. Then, as it’s basically guessing from the space of all CAPTCHAs present in your dataset, there is a tiny chance of it ever getting a label right so the loss will involve a very large number of possible categories and a likely 0 success rate and so give an invalid value.
You need another approach for this sort of problem (or if just using this to practice I’d find another problem). You’d want to do something like object detection to detect each letter individually.
I had the same doubt in my mind before diving in, considering it would logically make more sense for the AI to somehow be able to interpret individual characters from the captcha rather than associating entire labels to the images, but I simply thought that the neural net would eventually end up figuring it out somehow. I now see where I went fundamentally wrong. But just to help me clarify one follow up query, check this picture out.
I got a train_loss of 0.046 after several iterations. What does that imply?
That would be because the model is able to memorise the images, it just can’t learn any relation to the actual captchas. Your giving it a whole set of the captchas and asking it to learn to output 1 when it sees the first, 2 when it sees the second, etc (actually outputs are one-hot encoded so it’s outputting 1 on output 0, then 1 on output 1). It’ll learn to do that but when you show it an image it hasn’t seen in your validation set it of course has no idea. You’re not actually giving it the information that 1 is the captcha ABCDE say.
Understood. Thanks for the clarification!
On a side note, any idea how one would actually pull something like this off? Or is that beyond the scope of lesson1 of fastai?
Definitely beyond the lesson 1 scope. You could, at that sort of stage, try it as a multi-label, multi-category problem (multi-category being that you have 36 categories A-Z and 0-9, 62 if you had lowercase, multi-label means that more than one label applies to each, classifying a single character would be multi-category but only single-label).
So you’d label a captcha of “ABC123” as [“A”,“B”,“C”,“1”,“2”,“3”]. That way your network has enough information to actually learn to detect letters as individual letters are labelled. That won’t give full captcha detection, “ABC123” and “1A2BC3” would come out the same. But it’s doable.
For proper order detection you could use object detection which is covered in a latter lesson (more information on it in older versions of the course). That would give bounding boxes from which you could derive the order.
Otherwise you’d be looking at a custom architecture to do multiple ordered multi-category, single label detections, i.e. each character in the captcha as a multi-category, single-label problem. This is probably the best approach, object detection involving making the network learn the irrelevant location detection when you just want order. That’s right at the edge of the scope of the fastai courses, they try to prepare you to get there but don’t actually go there (though object detection is pretty close it’s just a standard tpye of more advanced architecture).