Understanding log predictions

Pomo · December 6, 2018, 12:23am

Hi coders,

I’m working on my first Kaggle competition:

The problem is to classify slides as cancer vs not cancer. In the actual doing, I realized parts of my basic understanding are not clear.

Using the fastai 1.0.33 pipeline with resnet34 architecture, a model is automatically generated. Its last layer is
Linear(in_features=512, out_features=2, bias=True).

Here I understand that the first unit activation represents log liklihood of class 0 (not cancer), and the second unit class 1 (cancer).

Train one cycle, and then
log_preds,val_labels = learn.get_preds()

This yields log_preds as a 44005x2 matrix and val_labels as a 44005 length vector.

error_rate(log_preds,val_labels) gives a reasonable answer around 10%. I understand that to get relative probabilities for each class you would next exponentiate and apply softmax. So far everything works like I expect.

However,
log_preds.sum(dim=1) yields a vector 1.0’s, 44005 of them. That is, the sum of log-probabilities for both classes is always 1.0 for every validation image.

Why? In my beginner’s understanding, you should see (at least somewhat) independent measures for cancer and not-cancer liklihoods.
And if this sum is always 1.0, then the second class’s value can be derived by simple arithmetic from the first. So it seems for the last linear layer you could take 512 features into 1, thereby training with half the number of units. Or is this actually a one-class problem, whatever that may be?

Thanks for helping me untangle my confusions!

wyquek · December 6, 2018, 12:45am

your understanding is correct. This is probably why there are two outputs instead of one, i think

sgugger · December 6, 2018, 2:13am

Note that in v1, get_preds() returns the actual probabilities and not their log.

Pomo · December 6, 2018, 7:04am

That would certainly explain why they add up to 1.0. Thanks for clarifying.