Yes/no (go/no go) training

jeffbiss · May 31, 2022, 9:20pm

From my reading and this course, it appears that AI is trained to categorize, not to determine whether something is or is not. For example, the bear example trains the model to categorize between three types of bear. But what if all I want to do is determine whether the image is a bear or is not a bear?

I tried this by training only bear images and then tested the result with a car image. It failed, which is not surprising as I didn’t think that this would work, but wanted to experiment. Is the way to train a yes/no, or go/no go, scenario is to use two image categories, one of the thing and the other of literally anything else? After all, EVERYTHING other than a bear should be in the “no” category as discussed in this old thread:

I haven’t found anything about this in the AI literature as I haven’t been using the correct search terms.

mike.moloch · May 31, 2022, 9:26pm

Hi @jeffbiss I have read previously in these forums that this issue is dealt with in the 3rd or 4th chapter of Jeremy/Sylvain’s book. HTH

jeffbiss · May 31, 2022, 9:48pm

Happy Tuesday, Mike! By coincidence I’m looking through the book now and haven’t found the answer, maybe because my search terms are wrong, I obviously don’t know what the correct term for a “go/no go” classifier scheme would be to search for. I have the digital version and so can search.

None of the chapter titles scream “go/no go” classification schemes. BTW, chapter 4 is about a digit classifier, which by default has samples of each digit. My guess is that it is in the book, but I haven’t found it yet. The closest thing is in chapter 6, Multi-Label Classification that states that:

“*For instance, this would have been a great approach for our bear classifier. One problem with the bear classifier that we rolled out in Chapter 2 was that if a user uploaded something that wasn’t any kind of bear, the model would still say it was either a grizzly, black, or teddy bear”

This makes me think that this may be an indication of the direction that I must go, as in EVERYTHING not a bear must be in the training set.

suvash · May 31, 2022, 10:49pm

You’re indeed at the exact correct chapter for what you have in mind. The general idea is that the model can output probabilities for multiple labels (targets, final layers & loss functions are swapped accordingly). Based on a threshold that you choose for the prob. output (for each label), you can have a “presence/absence” for each of them.

All the tools you need for the purpose is exactly in that chapter(one-hot encoded target, binary cross entropy loss, sigmoid activations, thresholds etc.).

radek · May 31, 2022, 11:05pm

This is a feature of machine learning models

It is all about what is in your data. You can only try to make any claims about what a model will do based on its performance on the test set.

Essentially, if the data you feed to your model is vastly different from what you have evaluated the model on, it is impossible to say what the model will do.

It could do anything within the scope of its capabilities.

But to your point, there are two ways you can train a model:

with softmax activation (cross-entropy loss)
- your model is always trying very hard to predict one of the classes it can predict)
with sigmoid activation (binary cross-entropy loss)
- your model is trying to predict the presence of one or more classes (or 0)

I am not fully sure what you mean by ‘go / no go’ but I think it would be something like a sigmoid model with a single class (cat / not a cat). But even in such a scenario, you can only reason about the performance of your model based on the data you have shown it in test.

Generally, training a model is all about generalizing to unseen data, but this can only work semi-reliably if the data your model sees during training is like the data it will see in production! (the idea of data differing between training and production is called domain shift and it can be very problematic, Jeremy talks a bit more about it here.

Honestly, this is such a vast question, I am not fully sure this post does it justice But maybe some of what I wrote might be useful

PS. This blog post by Rachel Thomas has great discussion on creating a validation set. It is a very important aspect of what we are discussing here I feel and can potentially shed additional light on this matter.

jeffbiss · June 1, 2022, 2:29pm

Suvash, Happy Wednesday! Thanks for confirming that. As I am not cognizant of all of the appropriate AI terms or method names, I wanted to make sure that there was not another method that I should be looking at.

jeffbiss · June 1, 2022, 2:45pm

Radek, Happy Wednesday! Thanks for that discussion. You’re spot on about go/no go, that is precisely what I want the model to do, determine yes or no. As I told Suvash, I don’t know all of the AI terms and method names and so didn’t want to start doing one thing, such as following chapter 6, when something that I hadn’t found was better.

Choosing data and how to train is pretty obvious when the result is to determine whether an x-ray indicates cancer or not, but not for determining whether it is either a bear or anything else in the universe. Animals, including us, do this rather well with little training.

iamgianluca · June 2, 2022, 4:54pm

As suggested by @radek and @suvash, we should treat this as a multi-label classification problem ― where the model is asked to figure out if the image contains a bear, a teddy bear, or neither of them.

Generally speaking, in these situations, our model will have k output nodes ― where k represents the number of classes ― and use a sigmoid activation. In your specific example, k is equal to 2, thus we would have 2 nodes. 1 node for a bear, 1 for a teddy bear, and if neither of these nodes “fire”, the outcome is expected to be no bear.

Binary classification, in a sense, works in the same way. We have k=1 class: presence of cancer. Thus, we need 1 output node and use a sigmoid activation to decide if there is cancer or not.

jeffbiss · June 2, 2022, 5:51pm

Happy Thursday, Gianluca! My problem is that I want to understand “best practices” so I don’t waste too much time. For example, it seems very obvious that providing a bear label, a set of bear images, and a not-bear label, a set of literally anything that is not a bear, to training would result in a model that would differentiate between bear and not-bear. But has that oversimplification proven to be an invalid way to train?

Should go/no-go training provide more labels such that the model would actually be able to differentiate amongst other valid objects, such as deer, bobcats, humans, dogs, etc, such that it can more accurately differentiate between bears and other wildlife even if the others are relegated to “not-bears”? I’m sure that the AI industry has dealt with this.

iamgianluca · June 2, 2022, 7:34pm

It all comes down to what inference time data will look like. If you want your model to be better at identifying bear/no bear when processing images of wildlife, just train the model with such images. Instead, if you want to be able to pass any type of image to your model (e.g., satellite, x-rays, house interiors, etc…), then try to build a dataset with such characteristics.

jeffbiss · June 2, 2022, 9:15pm

OK. I’ll just experiment with the simplest go/no-go label set, such as images of “bears” and images of “not-bears”. I suppose that makes sense as the requirement of the accuracy of the model is situationally dependent and something simple may be good enough. A higher need for accuracy would prove itself in testing, such as needing specific variants of “not-bears”.

iamgianluca · June 3, 2022, 1:53pm

Let us know how your experiments go

jeffbiss · June 3, 2022, 5:04pm

Will do!