If a model is running on multiple GPU, what are the parameters that I can tune to reduce training time.
- Is it ok to increase batch size?
- Is it ok to clip the gradient?
- Is it ok to increase number of epochs?
- Is it ok to increase learning rate?
If a model is running on multiple GPU, what are the parameters that I can tune to reduce training time.