I am trying to create a denoising autoencoder where I am using word vectors as input. Some of the values in the vectors are negative. Using a binary crossentropy loss function gives negative losses, perhaps because binary crsssentropy expects values between 0 and 1. Should I use mse as my loss function here? What should be my activation function in the final layer then(I was using sigmoid function as the activation function in the last layer for binary crossentropy)? Looking for suggestions. Thanks.
nafizh (Nafiz Hamid) #1
jeremy (Jeremy Howard) #2
No activation function (ie linear) and mse loss.