Hi guys,
I wondered what’s the best way to choose a learning rate when training language model with AWD-LSTM architecture from scratch. Does lr_finder make sense here?
Thanks
Yup it still does!
Thanks @muellerzr
I had another question I trained and for the firsts epochs :
after fitting the model:
learn.fit_one_cycle(10, 3e-3, moms=(0.8, 0.7, 0.8))
I’ve got
epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.503973484039307,4.630512237548828,0.24745343625545502,102.56658935546875,1:53:38
1,4.525827884674072,4.697213172912598,0.2421044558286667,109.64119720458984,1:53:31
2,4.515704154968262,4.68245792388916,0.2432451844215393,108.03528594970703,1:52:58
3,4.478466987609863,4.63539457321167,0.24665173888206482,103.0685806274414,1:55:27
4,4.409770965576172,4.558172702789307,0.25224268436431885,95.40898132324219,1:54:20
how can I interpret these as my accuracy and valid_loss drops slightly after the first epoch but start to increase gradually. does it need more epochs than 10 or am I doing something wrong?
I mainly follow the nn-vietnamese notebook but this behavior was odd to me in comparison to what I’ve seen in this notebook.
Thank you so much for your time
so you are pretraining the model on Vietnamese wiki with sentence piece tokenizer?
I checked my logs and got the following metrics for the pretraining with voacb-size 15k (you can’t compare them 1:1 if you use a different vocab size) on Vietnamese wikipedia:
epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.486952781677246,3.5356838703155518,0.35083749890327454,34.318477630615234,18:15
1,3.492870569229126,3.539703845977783,0.3489139676094055,34.45671463012695,18:11
2,3.4438414573669434,3.4574646949768066,0.35752221941947937,31.736412048339844,18:13
3,3.4070653915405273,3.3866140842437744,0.3658975064754486,29.565675735473633,18:15
4,3.23783278465271,3.3043224811553955,0.37601184844970703,27.230087280273438,18:13
5,3.2386205196380615,3.220266819000244,0.38542622327804565,25.034799575805664,18:26
6,3.165757417678833,3.123077630996704,0.39748650789260864,22.716184616088867,18:38
7,3.073399305343628,3.030221939086914,0.40953120589256287,20.701826095581055,18:44
8,3.0726006031036377,2.9566073417663574,0.4196123480796814,19.23261070251465,18:37
9,3.0338754653930664,2.934013843536377,0.42309603095054626,18.80295181274414,18:40
Can you share your notebook to see if something is wrong? You can also check my pretraining code here … I ported the nlp-course notebooks with sentence piece to fastai2.
Here’s my pretrained Vietnamese model: https://bit.ly/ulmfit-viwiki (see the README in the repo for more information).
If you have questions let me know.
Florian
@florianl Thank you for your response
actually. I wanted to pretrained it in Persian and using the default fastai tokenizer.
my training script is like this:
from fastai.text import *
from fastai.text.all import *
import pandas as pd
import pickle
import fastai
import torch
print(f"fastai version: {fastai.__version__}")
print(f"GPU which is used : {torch.cuda.get_device_name(0)}")
## define directories and output of the models
base = Path(".").absolute()
model_dir = base / "models_out"
model_dir.mkdir(exist_ok=True)
lm_fns = [model_dir/"final_model", model_dir / "dls.pkl"]
## reading train csv data
df = pd.read_csv("data_ULMFIT.csv")
## creating dataloader with batch size of 128
print("creating dataloader..")
## define model parameters
bs = 128
lr = 3e-3
lr *= bs/48 ## Scale learning rate by batch size
dls = TextDataLoaders.from_df(sample,path = base, is_lm=True, valid_pct=.1,bs=bs, seed=42)
print("saving dataloader")
torch.save(dls, out / "dls.pkl")
## define learner, metrics and CSVLogger to save the metrics
learn = language_model_learner(dls, AWD_LSTM, drop_mult=0.1, wd=0.1, pretrained=False,cbs=[
CSVLogger(fname=base/"history.csv")],metrics=[accuracy,Perplexity()]).to_fp16()
## fitting the model
learn.unfreeze()
learn.fit_one_cycle(10, lr, moms=(0.8, 0.7, 0.8))
## saving pretrained model
with open(lm_fns[1], 'wb') as f:
pickle.dump(learn.dls.vocab, f)
learn.save(lm_fns[0],with_opt=False)
I think the issue right now is, that TextDataLoaders
uses SpacyTokenizer which defaults to lang=en
. So you’ll have to initiate the SpacyTokenizer first and pass it to the data block. So basically what I am doing in my repo but just with SpacyTokenizer(lang=xx)
.
tok = SpacyTokenizer(lang=xx)
# this expects the data to be in a folder / text-files instead of a data frame ... so you'll have to adjust that
dblock = DataBlock(blocks=TextBlock.from_folder(data_path, is_lm=True, tok=tok, backwards=backwards),
get_items=get_files,
splitter=RandomSplitter(valid_pct=0.1, seed=42),
)
dls = dblock.dataloaders(data_path, path=data_path, bs=bs, num_workers=num_workers)
# now check if the tokenization makes sense in your language
dls.show_batch()
Florian
@florianl Yeah
I don’t think spacy support persian language but I gonna use SP as I used before when experimenting with some transformer models.
Thank you so much for your time.