BERT - Why is validation loss oscillating so much?

I am training a multi label BERT Model and the training and Validation loss for every batch looks as per the PDFs attached. I am trying to understand why Validation Loss is oscillating so much. Could you please give me some pointers to explore?

Training Loss Curve for every batch -
Training.pdf (27.8 KB)

Validation Loss Curve for every batch -
Validation.pdf (33.9 KB)

Train vs Val Loss (by Epoch) -
Train vs Val.pdf (30.7 KB)

The training configuration is -

model_checkpoint = “emilyalsentzer/Bio_ClinicalBERT”
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint,do_lower_case=True,force_download=True)
bert_config = AutoConfig.from_pretrained(model_checkpoint)
bert_config.num_labels = 20
bert_config.problem_type = “multi_label_classification”
model = AutoModelForSequenceClassification.from_config(bert_config)
model = model.to(device)

epochs: 50
batch_size: 16
lr: 2e-05

optimizer = AdamW(model.parameters(),lr=lr,eps=1e-8)
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps=0,
num_training_steps=total_steps)

Please let me know if you need any additional details.

one of the top results from validation loss oscillating - Google Search
looks applicable…

1 Like