And I was thinking, that a good classifier should be able to classify sentences no matter the effective class distribution right?
Yes, you are right this might be automated with time. But currently, there is no framework library that does this automatically as there are different ways to address the problem and it isn’t super clear which one should framework peak.
Try weighted loss or WeightedRandomSampler two different approaches but they are more or less equivalent. WeightedRandomSampler might be a bit better though.
re. zip, Jeremy is using a trick here, you are talking about this code right
def predict_with_targs_(m, dl):
m.eval()
if hasattr(m, 'reset'): m.reset()
res = []
for *x,y in iter(dl): res.append([get_prediction(to_np(m(*VV(x)))),to_np(y)])
return zip(*res)
Think what is in the res, it contains
[ [Prediction1, Label1], [Prediction2, Label2], [Prediction3, Label3], [Prediction4, Label4], [Prediction5, Label5] ....]
when you put res with * to zip, each element in the table is passed as a separate argument to zip. So this code:
zip (*res)
is equivalent to:
zip (res[0], res[1], res[2], .... , res[n])
So for the res as in example above it will return 2 tuples:
(Prediction1,Prediction2,Prediction3,Prediction4,Prediction5 …) , (Label1, label2, label3,…)
So if the first value is taken from the zip it means that you take the array / tuple with predictions and ignore the labels.
but don’t take my word for it and run debugger step in to the code and see for your self.