This forum post has a good explanation for this. In short, the predictions coming out of linear1(train_x)
are centered around 0 with both negative and positive values, so 0 is a good threshold for the binary classification. Later on, once predictions are passed through sigmoid, the threshold is then 0.5.
1 Like