I am working on Kaggle Amazon mobile phone review data.
Steps:
-
I loaded CSV file in pandas
-
Create
TextFile
for language model
data_lm = (TextList.from_df(df, path, cols=4).random_split_by_pct(0.1).label_for_lm().databunch())
data_lm.save("amazon_data_lm")
-
I then check vocab size which was
40405
(Seems random though?) -
Created learner, finetuned and saved encoder.
-
Created
TextFile
for classification -
Created learner and when I tried to
load
the encoder I received size mismatch error. Also vocab size was different this time.
RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore: size mismatch for encoder.weight:
So I am not able to load the encoder? Due to the different size of vocab? If yes then how can I solve the issue? Also I checked IMDB notebook learners and seems like in both the cases (LM file and classification) vocab size is greater than 60 thousand.
You can find the notebooks here in this repo
Thanks