Lesson 10 Discussion & Wiki (2019)

The edited video has now been added to the top post.

6 Likes

Arrgh - not clarified yet.

  • Could you show the sequence of functions that turns activations into a loss number, using “binomial loss”?

  • When I run the lesson3-planets multi-label example, the model ends with Linear and the loss function is FlattenedLoss of BCEWithLogitsLoss(). This is defined as sigmoid followed by binary cross entropy loss.

Thanks for sorting this out.

I have some trouble understanding the use of register_buffer().
My questions are:

  1. When should I register a buffer? For what sort of Variables and for which not?
  2. Could someone provide me with a simple example and code snippet of using register_buffer()?
1 Like

From reading about this in the Pytorch forums, here’s some info for your first question:
" If you have parameters in your model, which should be saved and restored in the state_dict , but not trained by the optimizer, you should register them as buffers.
Buffers won’t be returned in model.parameters() , so that the optimizer won’t have a chance to update them."
I hope to do some work with them tomorrow and if so will post a code snippet (assuming somone else doesn’t beat me to it :slight_smile:

5 Likes

Regarding question 2 - here’s the code for batchnorm and you can see how they register params vs buffers. Params are learnable (i.e. gradient) vs buffers are not, so that’s the main difference:

def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True,
                 track_running_stats=True):
        super(_BatchNorm, self).__init__()
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats
        if self.affine:
            self.weight = Parameter(torch.Tensor(num_features))
            self.bias = Parameter(torch.Tensor(num_features))
        else:
            self.register_parameter('weight', None)
            self.register_parameter('bias', None)
        if self.track_running_stats:
            self.register_buffer('running_mean', torch.zeros(num_features))
            self.register_buffer('running_var', torch.ones(num_features))
            self.register_buffer('num_batches_tracked', torch.tensor(0, dtype=torch.long))
        else:
            self.register_parameter('running_mean', None)
            self.register_parameter('running_var', None)
            self.register_parameter('num_batches_tracked', None)
        self.reset_parameters()

Hope that helps!

4 Likes

Hmm, do you mean multi-label or multi-CLASS? I think so far
multi-class default is Categorical Cross Entropy using softmax and multi-label default is Binary Cross Entropy using sigmoid. (These are also the defaults in the fastai library based on the labelclass)

So from my understanding of Jeremy in the lecture it would often make sense for real world mutliclass problems to not use softmax but rather the binary cross entropy (multi-label) version and then use thresholds and/or argmax with the results to figure out the single class. In that way we also get the probabilities for the class, undistorted by softmax, in order to be able to differentiate given classes vs. “background”/“no label”, in case probabilities are small for all of the classes. Is this what he meant?

This would finally answer my question asked during v3 part 1 :wink: :

from here.

6 Likes

Thanks @deena-b I’ll look into VScode soon. I’ve used vim while writing bash scripts but I kept forgetting the commands all the time and deleting my code. I’ll text you tomorrow.

Yes, those are great takeaways I’ll write that down.

I was looking at the new version of the Runner class, and I realised that we may have lost the ability for a callback to return True, is that correct?

Since res is set to False at the start, and we are using the ‘and’ operator, this effectively means that no matter what the callbacks return, res will be ultimately False, right?

3 Likes

I remember Graham Neubig saying that batch size is a hyperparameter. Can someone explain that? What is the difference of having batch size of 32 instead of 128 addition to the speed?

1 Like

Might be some basic mistake here. I’m confused in different behaviors in numpy and torch

np.array([10, 20]).var()
25.0

np.array([10, 20]).std()
5.0

torch.tensor([10., 20.]).var()
tensor(50.)

torch.tensor([10., 20.]).std()
tensor(7.0711)

in torch’s case they don’t seem to be taking a mean of the sum of the square of the deviations for the variance. Is this a bug ?

Update:
I digged further into this and looks like there is an arg called “unbiased” and if i set that, it matches numpy.

torch.tensor([10., 20.]).var(unbiased=False)
tensor(25.)

torch.tensor([10., 20.]).std(unbiased=False)
tensor(5.)
If unbiased is False , then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used.

1 Like

Oh silly me - I meant to say “binary” but wrote “binomial” then just read what was there rather than actually thinking about it! Thanks for pointing this out.

6 Likes

One thing I didn’t quite understand is Jeremy said softmax should not be used, but everyone uses it. What should be used instead? Or did I misunderstand?

Sigmoid and binary log likelihood.

2 Likes

Just check this paper - https://arxiv.org/pdf/1606.02228.pdf

Came here to ask this question after listening to the softmax part of Lesson 10. I would really appreciate any advice on how we can handle “not any of these” classes in single label classification problems. For example, I am doing the Tensorflow Speech Challenge on Kaggle, and there are 10 classes each for a one word spoken command like “yes”, “stop”, “go”, as well as 2 classes for “silence”, and “unknown” for any other word or utterance that doesn’t match.

To this point I’ve been using resnet34 with 12 classes as if they were all the same. Training “unknown” with words and noises that aren’t silence or any of the other 10 classes but, from what Jeremy is saying, it sounds like it would be better to have 11 classes, and instead of doing softmax as my final activation, do argmax, but if it doesn’t meet a certain absolute threshold to predict “unknown”. My concrete questions are:

  • If I do remove “unknown” as a class in the initial stages of training, is there a way to still use my “unknown” data in a useful way?
  • Where in my code do I go to stop using softmax? I looked in learn.model but don’t see it in the final layers, is it there by another name? or am I misunderstanding and softmax isn’t used in resnet34?

Thank you all!

2 Likes

It can be included in the loss function and, therefore, you would not find it in the model.
See for example the cross entropy loss in PyTorch which “combines nn.LogSoftmax() and nn.NLLLoss() in one single class.”

2 Likes

The loss function is not part of the model. You can see the loss function that was automatically chosen by fastai with
learn.loss_fn

To change the loss function, simply reassign it. Take a look at fastai’s BCEWithLogitsFlat for a likely candidate. The function it returns applies sigmoid, then binary cross entropy.

Once you train using BCEWithLogitsFlat, you’ll need to apply sigmoid to the predicted output activations in order to convert them to probabilities. The last time I checked, learn.get_preds outputs activations when it does not recognize your loss function; if it does recognize, it returns probabilities. But to be sure you should check what it is doing by looking at its outputs or by tracing code.

HTH, and experts please correct my errors!

3 Likes

If it’s helpful, I covered the question of “which loss function do I use for data that’s multi-class AND multi-label” in my talk on the Human Protein Image Classfication Kaggle competition: https://youtu.be/O5eHvucGTk4?t=1150

8 Likes

Hi Stas. I am wondering about the right way to keep fastai updated after installing pytorch-nightly for the course.

And should we keep updating pytorch-nightly using:

conda install -c pytorch pytorch-nightly

My current pytorch is:

pytorch-nightly 1.0.0.dev20190405 py3.7_cuda10.0.130_cudnn7.4.2_0 pytorch
nvidia driver is 418.56

and everything works.

To update fastai I tried both:

conda install -c pytorch -c fastai fastai
conda install -c fastai fastai 

Complaints:

The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::python-graphviz==0.8.4=py36_1
  - pytorch/noarch::torchvision==0.2.1=py_2
  - anaconda/linux-64::py-opencv==3.4.2=py36hb342d67_1
  - defaults/linux-64::psutil==5.4.7=py36h14c3975_0
  - defaults/linux-64::simplegeneric==0.8.1=py36_2
  - defaults/linux-64::qtpy==1.5.2=py36_0
  - fastai/noarch::fastai==1.0.51=1

The former command wants to install:

The following NEW packages will be INSTALLED:

  pytorch            pytorch/linux-64::pytorch-1.0.1-py3.7_cuda10.0.130_cudnn7.4.2_2

The following packages will be UPDATED:

  torchvision                                    0.2.1-py_2 --> 0.2.2-py_3

The latter command wants to install:

The following NEW packages will be INSTALLED:

  cudnn              pkgs/main/linux-64::cudnn-7.3.1-cuda10.0_0
  pytorch            pkgs/main/linux-64::pytorch-1.0.1-cuda100py37he554f03_0

Please advise. BTW, I’m a Linux ignoramus regarding packages. Thanks!

1 Like

Thanks Malcolm this helps a lot. I’ll try to get it working and then report what worked/didnt back here later. Cheers.