Why use binary output versus having two output units?

I noticed we use two nodes to represent the final output. I.e., we get two probabilities, probability of dog, and probability of cat, and they add to 1.

However, we could have just used one node, I think right? We’d just have the node turn on when there is a dog and not otherwise.

What are the advantages of using binary output versus two outputs?

Great question! I did it this way because it means we can extend to larger numbers of categories without changing any of our code. But other than this convenience, there’s no reason not to use a single column for dogs v cats - and keras supports this, by choosing class_mode=‘binary’

Aren’t there also any optimization differences? In the case of two outputs you generally have a softmax whereas in the case of one output you don’t. How does it impact optimization?

As far as I understand Binary encoding encodes the data in fewer dimensions than one-hot, but with some distortion of the distances which I believe is a bad thing for neural network no?

[quote=“jeremy, post:2, topic:154, full:true”]and keras supports this, by choosing class_mode=‘binary’
[/quote]

This recently resurfaced post piqued my interest, however I couldn’t get the single column probability from this code

val_batches, probs = vgg.test(valid_path, batch_size = batch_size)

These are the parameters I’ve changed:
In get_bathes function,
class_mode='binary'
In ft function,
model.add(Dense(1, activation='sigmoid'))
In test function,
class_mode=None
In compile funciton

             loss = 'binary_crossentropy', metrics=['accuracy'])```


Did I miss anything?