This is more or less an skeleton of working code. The first part you may take it from the Jypiter journal.
Of course, any insight if I am not doing something optimally is most welcome!
path = Path(*your path to the folder with your unsupervised documents*)
model_url = *url to your pre-trained language model, e.g., from fastai in English: URLs.WT103_1*
Then try to load your data for the language model:
try:
data = TextLMDataBunch.load(path, 'tmp_lm', bs=batch_size)
except FileNotFoundError:
print('Data bunch not found, creating one from data source...')
data = (TextList.from_folder(path)
.filter_by_folder(include=())
.random_split_by_pct(0.1)
.label_for_lm()
.databunch(bs=batch_size))
data.save('tmp_lm')
Now you instantiate a language model learner:
learner = language_model_learner(data, pretrained_model=model_url, drop_mult=0.3)
Learning rate finder:
learner.lr_find()
learner.recorder.plot(skip_end=15)
Then you train the last layers of the language model:
try:
learner.load('fit_head')
except FileNotFoundError:
print('\nTraining language model (last layers)...')
learner.fit_one_cycle(1, 5e-2, moms=(0.8, 0.7))
learner.save('fit_head')
Then you train the whole thing. In the Jupyter example they use 10 cycles here, but in my case apparently one is better? I’m still figuring out these details.
try:
learner.load('fine_tuned')
except FileNotFoundError:
print('\nFine-tuning learner...')
learner.unfreeze()
learner.fit_one_cycle(number_rounds, 5e-3, moms=(0.8, 0.7))
learner.save('fine_tuned')
To test the language model (which is fun):
text_prompt = 'I wonder what text comes after this"
n_words = 100
n_sentences = 2
print("\n".join(learner.predict(text_prompt, n_words, temperature=0.75)
for _ in range(n_sentences)))
Save the language model encoder (the part that the classifier will use):
learner.save_encoder('fine_tuned_enc')
Next step: we need a classifier. IMPORTANT: we need the vocabulary from the language model!
vocab = data.vocab
Loading its dataset.
try:
classifier_data = TextDataBunch.load(path,
'tmp_multi_label_data',
bs=batch_size)
except FileNotFoundError, IndexError:
print('Some error message')
label_cols = [0, 1, 2, 3] # the columns from which you take the labels in the csv file
classifier_data = (TextList.from_csv(path,
relative_path_to_csv_file_from_path,
cols='text',
vocab=vocab)
.random_split_by_pct(valid_pct=0.2)
.label_from_df(cols=label_cols)
.databunch(bs=batch_size))
classifier_data.save('tmp_multi_label_data')
Then you create the classifier learner:
classifier_learner = text_classifier_learner(classifier_data,
drop_mult=0.5,
metrics=[fbeta])
classifier_learner.load_encoder('fine_tuned_enc')
Finally, just train the learner. I have not been completely successful here yet, although it trains and learns and classifies. Just not as good as another classifier I have…
classifier_learner.freeze()
classifier_learner.lr_find()
classifier_learner.recorder.plot(skip_end=15)
classifier_learner.fit_one_cycle(1, 1e-1, moms=(0.8, 0.7))
classifier_learner.save('first_cycle')
classifier_learner.fit_one_cycle(1, 5e-2, moms=(0.8, 0.7))
classifier_learner.save('second_cycle')
classifier_learner.freeze_to(-2)
classifier_learner.fit_one_cycle(1, slice(1e-2 / (2.6 ** 4), 1e-2), moms=(0.8, 0.7))
classifier_learner.save('third_cycle')
classifier_learner.freeze_to(-3)
classifier_learner.fit_one_cycle(1, slice(5e-3 / (2.6 ** 4), 5e-3), moms=(0.8, 0.7))
classifier_learner.save('fourth_cycle')
classifier_learner.unfreeze()
classifier_learner.fit_one_cycle(1, slice(1e-3 / (2.6 ** 4), 1e-3), moms=(0.8, 0.7))
classifier_learner.save('fifth_cycle')
That’s it! Now you may load a saved classifier:
classifier_learner.load('third_cycle')
And classify stuff:
prediction = classifier_learner.predict(string)
This is not exactly a working example, but it comes close. I hope it helps!