Output size is the number of neurons in your final layer. This is determined by the nature of your problem. For a classification problem with just two classes, you could have just one final neuron or do what I did. However, to work with just one output size you must change two things: the loss function of the learner and the final activation function.
In your case, I’m guessing your problem is you are not using the right activation function. You need to use a sigmoid function to constrain the output of your network .
Hope it helps!