Optimizing CPU-Only training but with 300GB RAM

I am learning neural nets and FastAI for work, so I do not have any choice at the moment but to run on a server that has no GPU. However, it has 300GB RAM and an Intel Xeon E5-2699 v4. How can I optimize training time for ULMFit model? I am essentially following the examples from the course, such as this link.

I am currently using a sample of around 800,000 individuals for training. The complete set is much larger. The response variable is continuous, rather than classification. The language model takes about 15 hours for one epoch (total vocab is around 70,000 words though this may be reduced). The language model was trained with bs, bptt = 512, 80, using AWD_LSTM. The response model is trained with the following
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, pretrained=False,
metrics = mae)
The current estimate is about 18 hours total. What can I do to decrease training time?


Are all CPU cores 100% utilized? If yes, there is not much you can do without shrinking the model or else modifying hyperparams of the model.
Or carefully going through all the code optimizing every bit of it :slight_smile:

P.S. 18 hours is actually not that bad considering it is running on CPUs

Yes, 100% utilization. What are some hyperparameters that I should try changing? I am just learning neural nets so I’m not sure where to begin.