NLP challenge project

The code snippet is right here. In words:

  1. I assume that the negative case is less frequent.
  2. Split the DataFrame into postiive and negative cases
  3. Resample the negative cases (with replacement) up to the length of the positive cases.
  4. Concat the two DataFrames together into one dataset with 50% positive/negative split.

There are other ways to approach this including class weights on the loss function. And an oversampling callback described here and implemented in thelibrary here.

Lots of good options! Let us know what works for you and how you progress on your project.

2 Likes