Find best learning rate when training model from scratch

saied · July 8, 2021, 9:41am

Hi guys,
I wondered what’s the best way to choose a learning rate when training language model with AWD-LSTM architecture from scratch. Does lr_finder make sense here?
Thanks

muellerzr · July 8, 2021, 1:45pm

Yup it still does!

saied · July 8, 2021, 2:39pm

Thanks @muellerzr
I had another question I trained and for the firsts epochs :
after fitting the model:

learn.fit_one_cycle(10, 3e-3, moms=(0.8, 0.7, 0.8))

I’ve got

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.503973484039307,4.630512237548828,0.24745343625545502,102.56658935546875,1:53:38
1,4.525827884674072,4.697213172912598,0.2421044558286667,109.64119720458984,1:53:31
2,4.515704154968262,4.68245792388916,0.2432451844215393,108.03528594970703,1:52:58
3,4.478466987609863,4.63539457321167,0.24665173888206482,103.0685806274414,1:55:27
4,4.409770965576172,4.558172702789307,0.25224268436431885,95.40898132324219,1:54:20

how can I interpret these as my accuracy and valid_loss drops slightly after the first epoch but start to increase gradually. does it need more epochs than 10 or am I doing something wrong?
I mainly follow the nn-vietnamese notebook but this behavior was odd to me in comparison to what I’ve seen in this notebook.
Thank you so much for your time

florianl · July 8, 2021, 4:10pm

so you are pretraining the model on Vietnamese wiki with sentence piece tokenizer?

I checked my logs and got the following metrics for the pretraining with voacb-size 15k (you can’t compare them 1:1 if you use a different vocab size) on Vietnamese wikipedia:

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.486952781677246,3.5356838703155518,0.35083749890327454,34.318477630615234,18:15
1,3.492870569229126,3.539703845977783,0.3489139676094055,34.45671463012695,18:11
2,3.4438414573669434,3.4574646949768066,0.35752221941947937,31.736412048339844,18:13
3,3.4070653915405273,3.3866140842437744,0.3658975064754486,29.565675735473633,18:15
4,3.23783278465271,3.3043224811553955,0.37601184844970703,27.230087280273438,18:13
5,3.2386205196380615,3.220266819000244,0.38542622327804565,25.034799575805664,18:26
6,3.165757417678833,3.123077630996704,0.39748650789260864,22.716184616088867,18:38
7,3.073399305343628,3.030221939086914,0.40953120589256287,20.701826095581055,18:44
8,3.0726006031036377,2.9566073417663574,0.4196123480796814,19.23261070251465,18:37
9,3.0338754653930664,2.934013843536377,0.42309603095054626,18.80295181274414,18:40

Can you share your notebook to see if something is wrong? You can also check my pretraining code here … I ported the nlp-course notebooks with sentence piece to fastai2.

github.com

floleuerer/fastai_ulmfit/blob/main/2_ulmfit_lm_pretrain.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pretrain Language Model on Wikipedia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.text.all import *\n",
    "from fastai.callback.progress import CSVLogger"
   ]
  },
  {

This file has been truncated. show original

Here’s my pretrained Vietnamese model: https://bit.ly/ulmfit-viwiki (see the README in the repo for more information).

If you have questions let me know.
Florian

saied · July 8, 2021, 6:08pm

@florianl Thank you for your response
actually. I wanted to pretrained it in Persian and using the default fastai tokenizer.
my training script is like this:

from fastai.text import *
from fastai.text.all import *
import pandas as pd
import pickle

import fastai
import torch
print(f"fastai version: {fastai.__version__}")
print(f"GPU which is used : {torch.cuda.get_device_name(0)}")


## define directories and output of the models
base = Path(".").absolute()
model_dir = base / "models_out"
model_dir.mkdir(exist_ok=True)
lm_fns = [model_dir/"final_model", model_dir / "dls.pkl"]

## reading train csv data
df = pd.read_csv("data_ULMFIT.csv")
## creating dataloader with batch size of 128
print("creating dataloader..")
## define model parameters
bs = 128
lr = 3e-3
lr *= bs/48 ## Scale learning rate by batch size

dls = TextDataLoaders.from_df(sample,path = base, is_lm=True, valid_pct=.1,bs=bs, seed=42)
print("saving dataloader")
torch.save(dls, out / "dls.pkl")

## define learner, metrics and CSVLogger to save the metrics
learn = language_model_learner(dls, AWD_LSTM, drop_mult=0.1, wd=0.1, pretrained=False,cbs=[
                CSVLogger(fname=base/"history.csv")],metrics=[accuracy,Perplexity()]).to_fp16()


## fitting the model
learn.unfreeze()
learn.fit_one_cycle(10, lr, moms=(0.8, 0.7, 0.8))

## saving pretrained model
with open(lm_fns[1], 'wb') as f:
      pickle.dump(learn.dls.vocab, f)

learn.save(lm_fns[0],with_opt=False)

florianl · July 8, 2021, 6:55pm

I think the issue right now is, that TextDataLoaders uses SpacyTokenizer which defaults to lang=en. So you’ll have to initiate the SpacyTokenizer first and pass it to the data block. So basically what I am doing in my repo but just with SpacyTokenizer(lang=xx).

tok = SpacyTokenizer(lang=xx)

# this expects the data to be in a folder / text-files instead of a data frame ... so you'll have to adjust that
dblock = DataBlock(blocks=TextBlock.from_folder(data_path, is_lm=True, tok=tok, backwards=backwards),
                   get_items=get_files,
                   splitter=RandomSplitter(valid_pct=0.1, seed=42),
                  )

dls = dblock.dataloaders(data_path, path=data_path, bs=bs, num_workers=num_workers)

# now check if the tokenization makes sense in your language
dls.show_batch()

Florian

saied · July 8, 2021, 7:41pm

@florianl Yeah
I don’t think spacy support persian language but I gonna use SP as I used before when experimenting with some transformer models.
Thank you so much for your time.