I’ve a language model which is generate fragments which make sense.
I’m using this encoder to train a binary classification model based on the IMDB example, but my dataset has 5% positive classes. Do I need to re-balance this dataset, or make any adjustments to hyper parameters or loss function for this situation.
The quick answer is scale the loss depending on which class your are predicting (it should be inversely proportional to the amount of data for that class).
I have come across a fraud problem, where non fraudulent transactions consists of 99% data, and remaining 1% is fraudulent transactions. Have you come across any paper which handles this issue, please share.