Troubleshooting my Hardware

To use Multiples GPU first we need to wrap the model on DataParallel. But will also lead to another problem. Overheat of one of the GPUs.

You can read more about this behavior between PyTorch and DataParallel on this Blog:

https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

Also take a look on the reply of professor @jeremy about Multi-GPU.