Textlist from df

jbo · July 8, 2019, 9:36am

Thanks let me try changing it ,i thought the function should be converting true and false into boolean.

jbo · July 8, 2019, 12:11pm

data_clas = TextClasDataBunch.from_df(".", train_df=csv_df[:-1000], valid_df=csv_df[-1000:], vocab=data_lm.vocab, text_cols = ‘text’, label_cols = ‘label’,bs=16)

I removed the col = is_valid and trying to rebuild the language model again [so i will split randomly while building the classifier], however if you look at the third type I tried , there is_valid is not passed to the function “TextClasDataBunch.from_df.from_df” , since you insisted to use the datablock api , I am trying to do it from start. Thanks for your inputs .

jbo · July 8, 2019, 1:21pm

hi
@StatisticDean : It worked only after I removed the unique labels , the email id which were not repeated in the data set . Now I am thinking to duplicate the unique data , hope this works well as per NLP classification.

@sgugger : Is there a limitation in the code ,if the labels are unique [label : in my case it was email id] [text : is the technical problem text ] the function label_from_df will fail , whenever I did a split from df or rand split , the label_from_df failed with Nonetype error.

Thanks,

sgugger · July 9, 2019, 1:08pm

Categorical labels that are only present in your validation set are going to raise errors yes. This is intended as a mean to protect the user since you can’t make a model that has never seen some classes on the training set predict them on the validation set.

jbo · July 9, 2019, 5:24pm

adding @StatisticDean

Thanks for the clarification , in such a case what will be a better approach ,

Delete the unique data .
or
Create multiple copies of it : will the model be biased towards certain words in case this is done .

StatisticDean · July 10, 2019, 9:11am

I don’t have an answer to your question, I don’t think there is a general truth to the better way to proceed. You might want to try both ways. If you create multiple copies of your data, keep it mind, that the copies in your validation set will have been already seen by your model so be careful in your evaluation.

jbo · July 11, 2019, 9:39am

hi ,
Can i put this in the documentation of thee fast.ai ?

sgugger · July 11, 2019, 12:54pm

You can always suggest PR with amelioration to the docs