Binary Classification 1 or 2 output channels?

Hello,

right now I am working on a binary classification problem for images where I have to predict if there are tumor cells in an image or not.
By default the fastai library creates a custom head with two output channels because in the csv file no tumor is marked with 0 and tumor is marked with 1.
I am applying the thresholds on the prediction for “class” 1 (tumor) at the end and ignore the prediction for “class” 0 (no tumor).
But actually one output channel should be enough here and I am wondering if I should change the head to one output channel.
Does it make a performance difference if the model has one or two output channels for binary classification?

Thanks in advance,

Christoph

As I understand it, the last step is (in effect) to exponentiate the two activations and apply softmax. The resulting probabilities derive from those two activations relative to each other, and necessarily sum to one. Therefore even though the redundant probability is simply 1 minus the other, you still can’t eliminate one of the activations (classes). You need two to calculate both the loss function and the final probability.

That said, I think there’s a valid intuition here, one that I have also been wondering about. It seems that this problem could alternatively be posed as a regression, mapping one class to zero and the other to one. The [0,1] prediction would be used as the probability of class 1.

Can anyone comment on the relative merits of using regression for a binary classification problem, like accuracy, network complexity, training time? Thanks for any insights.

2 Likes

You’re right that the question could be posed as regression: this is like performing a logistic regression and then using a threshold (e.g. 0.5) to form a classifier.

In terms of the relative merits, it is my understanding that classification tasks are normally easier to perform in machine learning than regression. Indeed, in computer vision, tasks that would seem to fit the regression framework best are often re-framed as classification tasks (e.g. to determine the pixel intensities for a new image).

This evidence is only empirical: I’m not entirely sure why it tends to be easier.

@ChristophNeuner Can you show an example of your code please?, I am still struggling to make prediction on the new test set for binary tabular classification

Indeed, in computer vision, tasks that would seem to fit the regression framework best are often re-framed as classification tasks (e.g. to determine the pixel intensities for a new image).

Are you referring to super-resolution? Are you saying that superres treats each of the 256 pixel intensities as a separate class? If so, that was not my understanding. If that were the case, wouldn’t that imply that there is no explicit relationship between pixel classes 120, 121 and 122? Seems like encoding that those are close together(as in regression) would be useful.

I apologize if I’m misinterpreting what you’re saying here.