How to desugn text classifier not to make wild guesses

Tadewosbell · October 3, 2022, 4:41pm

Hey guys,
I am currently making a text classifier and I am trying to figure out how to design it so that it doesn’t make wild guesses and returns blank if it is too unsure.
Here is my lang_learner

lang_learner = language_model_learner(dls, arch = AWD_LSTM, pretrained = True, drop_mult=0.3)

and here is the class_learner

text_classifier_learner(class_mod, drop_mult=0.3, arch = AWD_LSTM, pretrained = True)

What params can I add to make it return blank it is not at all sure what the category is?
sorry if this is an armature question I am just trying to build a prototype application and my objective it primary to build an MVP and not to become an expert in ML

bahman_apl · October 3, 2022, 6:04pm

It depend on size of input text as I understand from lectures.
Small texts huggingface transformers have better performance than old fastai.

For more accessibility to change stuff you should check fastai documentation Middle APIs

benkarr · October 3, 2022, 10:04pm

Fastais learners generally have two methods to predict: learn.predict() and learn.get_preds(), but both return probabilities, for example with:

results = learn.get_preds(dl=learn.dls.valid).

results[0] is a ‘number of perdicted instaces’ \times ‘number of classes’ matrix where each row holds the probabilites of all the classes for one instance, e.g. if probs = results[0], then probs[123,6] would hold the probability that instance 123 is of label 6. Usually the label with the highest value is chosen as the prediction.

You could use probs.max(axis=1) which not only returns the predicted labels (as the default .argmax would do) but also the probability with which those labels have been predicted. You can then get those instances that are lower than a threshold you pick and set the corresponding labels to “not good enough” or whatever… -1 maybe .

Make shure to keep on asking if something is unclear or I missed the point .

Tadewosbell · October 4, 2022, 5:03pm

thanks I will give that a try.