I'm having trouble with my Huggingface trainer

ForBo7 · November 12, 2022, 4:41pm

Hello!

I’m using Huggingface Transformers to create an NLP model. I’m having issues during the training of this model, where an error is thrown. The error is thrown during the validation stage of the first epoch.

Initially I had an issue with my metric function. After fixing it (I think), a new TypeError is thrown:

TypeError: 'float' object does not support item assignment

The traceback is contained in the dropdown below.

Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_24/4032920361.py in <module>
----> 1 trainer.train()

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1411             resume_from_checkpoint=resume_from_checkpoint,
   1412             trial=trial,
-> 1413             ignore_keys_for_eval=ignore_keys_for_eval,
   1414         )
   1415 

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1741 
   1742             self.control = self.callback_handler.on_epoch_end(args, self.state, self.control)
-> 1743             self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1744 
   1745             if DebugOption.TPU_METRICS_DEBUG in self.args.debug:

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in _maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1910         metrics = None
   1911         if self.control.should_evaluate:
-> 1912             metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
   1913             self._report_to_hp_search(trial, epoch, metrics)
   1914 

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   2626             prediction_loss_only=True if self.compute_metrics is None else None,
   2627             ignore_keys=ignore_keys,
-> 2628             metric_key_prefix=metric_key_prefix,
   2629         )
   2630 

/opt/conda/lib/python3.7/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   2909 
   2910         if all_losses is not None:
-> 2911             metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
   2912 
   2913         # Prefix all keys with metric_key_prefix + '_'

TypeError: 'float' object does not support item assignment

I think the error is occuring after my accuracy is computed.

My metric function, which computes RMSE, is shown below.

def rmse(valid_pred):
    preds = valid_pred.predictions
    targs = valid_pred.label_ids
    return torch.nn.functional.mse_loss(torch.from_numpy(preds), torch.from_numpy(targs)).sqrt()

valid_pred is the data structure in which the evaluation predictions of my trainer are stored.

The metric function above returns a PyTorch tensor of dimension 0 (e.g., an example output was tensor(0.5261).

I would appreciate any input into figuring out why this error is being thrown! If you need to know any more information to do so, please do let me know.

Fahim · November 13, 2022, 1:00am

I don’t have a specific solution here but what I’d try to do is see what type metrics is. Is it a dictionary? Or is it a scalar variable? And if metrics is a dictionary, what type of value is returned by all_losses.mean().item()?

Since the error is in that particular line, generally you should be able to figure out what is going on by breaking that line down and trying to replicate the same scenario in code or in a Jupyter notebook. Hope that helps

ForBo7 · November 13, 2022, 7:07am

Thank you for the response!

metrics is part of the Huggingface Trainer class, so I’m not quite sure how I could go about checking that variable.

Fahim · November 13, 2022, 10:07am

The relevant file is at: /opt/conda/lib/python3.7/site-packages/transformers/trainer.py on your hard drive. You should be able to modify it to log whatever information you need, but do make sure to create a backup first of course

ForBo7 · November 13, 2022, 4:00pm

Ooo, yes, didn’t think of that! Will keep that in mind for next time!

I reread Jeremy’s NLP tutorial on Kaggle and figured out from there that the metric function should return a dictionary containing the metric. So now my trainer is working.

Thank you for your input, though! And I will keep in mind that I can try directly editing libraries in the future.