TextClasDataBunch.from_df generating batches with all samples from the same class

Created a databunch using the below: (dataframe consists of ~1500 elements EQUALLY split across two classes in the label column (“positive”, “negative”).

data_clas = TextClasDataBunch.from_df(path=".", train_df = sem_df_trn[[“label”, “text”]], valid_df = sem_df_val[[“label”, “text”]], vocab=data_lm.vocab, bs=100)

a = data_clas.one_batch()
print(a[1]) # prints all labels of the same class.
print(a[1].sum())
print(a[1].size())

My dataframe is random. If I do a df.head(100), I see that both classes are represented. But, the batch resulting from the databunch does not have an equal representation between positive and negative. If the increase the batch size to a really high number, I see that the batch has mostly positive in the beginning followed by mostly negatives towards the end. Any idea what is going on here?

fastai version: 1.0.60
torch: 1.4.0
OS: Debian Stretch

After much reading, I think this has something to do with the collate function. I have resampled my dataset, which means that the dataset has a bunch of duplicates. The collate function which is responsible for packaging the data into batches is likely what I need to fix. Is there any example of the collate function?

Is resampling your pandas dataframe before passing it to fastai TextClasDataBunch the wrong way to do it? It seems like the method to package a bunch of samples into a batch is using some form or hash. Why else would it be packaging similar samples into one batch? I have not read of anyone else running into this issue. So, it is highly likely that the issue is with my approach than with a bug in fastai. Anybody?