Textlist from df

Thanks let me try changing it ,i thought the function should be converting true and false into boolean.

hi @StatisticDean ,

data_clas = TextClasDataBunch.from_df(".", train_df=csv_df[:-1000], valid_df=csv_df[-1000:], vocab=data_lm.vocab, text_cols = ‘text’, label_cols = ‘label’,bs=16)

I removed the col = is_valid and trying to rebuild the language model again [so i will split randomly while building the classifier], however if you look at the third type I tried , there is_valid is not passed to the function “TextClasDataBunch.from_df.from_df” , since you insisted to use the datablock api , I am trying to do it from start. Thanks for your inputs .

hi
@StatisticDean : It worked only after I removed the unique labels , the email id which were not repeated in the data set . Now I am thinking to duplicate the unique data , hope this works well as per NLP classification.

@sgugger : Is there a limitation in the code ,if the labels are unique [label : in my case it was email id] [text : is the technical problem text ] the function label_from_df will fail , whenever I did a split from df or rand split , the label_from_df failed with Nonetype error.

Thanks,

Categorical labels that are only present in your validation set are going to raise errors yes. This is intended as a mean to protect the user since you can’t make a model that has never seen some classes on the training set predict them on the validation set.

2 Likes

adding @StatisticDean

Thanks for the clarification , in such a case what will be a better approach ,

  1. Delete the unique data .
    or
  2. Create multiple copies of it : will the model be biased towards certain words in case this is done .

I don’t have an answer to your question, I don’t think there is a general truth to the better way to proceed. You might want to try both ways. If you create multiple copies of your data, keep it mind, that the copies in your validation set will have been already seen by your model so be careful in your evaluation.

1 Like

hi ,
Can i put this in the documentation of thee fast.ai ?

You can always suggest PR with amelioration to the docs :wink:

1 Like