Lesson 3 In-Class Discussion

I don’t understand. Jeremy mentioned that multi label encoding doesn’t use Softmax. Then when he says how the labels are being compared against the activation function he says softmax. So is multi-label classification using Softmax or not using softmax?

How does Jeremy decide what early, middle, and last layers are?

2 Likes

Does it take only jpeg image formats?

As Jeremy first said, Multilabel shouldn’t use softmax. Softmax tries to prioritize a single output.

I think the difference is between multi-label and multi-class. When you have a multi-label problem you use sigmoid but with multi-class you would use softmax. With multi-class you want the class with the highest probability and with multi-label it means you can have multiple labels i.e. with planet you can have clear and primary labels both. But with dog breeds you can only have one breed of dog at a time out of 120 possibilities.

Another point to add is that with multi-label you need to come up with some threshold that the sigmoid values need to be in order to be considered. For example if you set a threshold of 0.2 then any of the sigmoid values that are over 0.2 would be in the list of labels that were assigned to that image.

11 Likes

Can custom metric functions accept inputs other than the preds and targs (e.g. some attribute from the input dataset)?

Are there any heuristics on what size to try and use if it’s not the same as imagenet?

Sigmoid Function

1 Like

got it. It’s using sigmoid. I thought I heard he said he used softmax.

%?? before the function

SoftMax is for single-label, Sigmoid is for multi-label.

EDITED

are there any other activations you would want to use at the end for any other reasons besides Softmax or Sigmoid?

not multi class but its multi label.

tanh was popular at one point.

not multi-class but multi label.

So what would be a good reason to use that? Or have these two phased out tanh?

The microphone sounds kind of muffled tonight to me. Maybe it’s just my ears, but it doesn’t sound great to me

Tanhs work well with Gans. It depends on the model you are using and the task.

1 Like

It’s complex, but I think it’s explained well here: https://stats.stackexchange.com/questions/101560/tanh-activation-function-vs-sigmoid-activation-function

4 Likes

Can the learning rate finder function be adapted to suggest the differential (intermediate) learning rates?

1 Like