What to do while waiting for models to train?

jamesplayer · August 22, 2017, 10:33pm

I notice there’s a lot of waiting around while doing deep learning.

I’m just curious what others do while they’re waiting for their training tasks to run. I often find myself just staring at the accuracy and waiting for the seconds to tick down to zero. What’s the best way to optimize this time?

machinethink · August 23, 2017, 9:25am

Doing stretches.

It gets worse when you’re training on very big datasets and the wait is hours (or days) instead of minutes.

mindtrinket · August 23, 2017, 12:01pm

I go back over the code to put in additional commenting for my understanding, clean something in the house, or as @machinethink said do some quick calisthenics.

I view it almost like a race, if there are 4 minutes of training, move the dryer clothes to the bed, move the wash to the dryer, and get back into my seat. Next time fold out some clothes. Next time put them away.

My house actually is looking cleaner these days!

WaterRocket8236 · August 23, 2017, 12:47pm

I feel like Sheldon Cooper from The Big Bang Theory.

surmenok · August 24, 2017, 2:02am

If you have extra hardware you can design/run another experiment in parallel. You can also analyze results of previous experiments and design a few next things to try. Do error analysis. Of course, it’s easier to find things to do if your experiment runs for 30 minutes or a few hours/days. If the experiment takes just 4 minutes to run then there is not much you can do

dataderrick · August 24, 2017, 2:42am

+1 for laundry/cleaning up around the house.

marcemile · August 24, 2017, 10:57am

The trick is to run more than one experiment at once. But then keeping track of the code changes and hyper parameters requires extra attention.

surmenok · August 25, 2017, 5:05pm

To track experiment results I maintain a spreadsheet with experiment name, all hyperparameter values, results (validation loss/accuracy). Also it might be useful to keep some kind of lab journal to keep track of what insights you got after evaluating experiment results and what was the reason for running the experiment (e.g. you got an insight that dropout probability impacts validation loss a lot and you decided to test a few different values for that parameter).

xuzhang · September 15, 2017, 11:12pm

Would you like to show your spreadsheet for reference? Thanks a lot

xuzhang · April 23, 2018, 6:18pm

Would you like to show your spreadsheet for reference? Thanks a lot

digitalspecialists · April 23, 2018, 8:06pm

Sleep …

jakubczakon · February 15, 2019, 12:38pm

Hi @jamesplayer,

Jakub from neptune.ml here.
If you keep track of metrics, hyperparameters and stuff you can do meta analysis while waiting and figure out what to do next. For example you could:

compare hyperparameters with stuff like skopt.plots. I have written a helper that lets you convert a simple dataframe with metric and parameter values as columns into a scipy.optimize.OptimizeResult object that skopt.plots expect. You can check it out here.
do meta-analysis of your results and project activity over time, like this project progress visualization.

On a side note, I have just added a simple callback that lets you monitor fastai training in Neptune to our neptune-contrib library. I explain how it works in this blog post but basically, with no change to your workflow, you can track code, hyperparameters, metrics and more.

Before you ask, Neptune is now open and free for non-organizations.
Read more about it on the docs page to get a better view.

paidion · April 13, 2020, 2:27am

I am thinking about that question too. I just wonder whether you can some good resolution. I will try to use https://tomato-timer.com/ or some other kind of pomodoro timer. Then, it is good to have a routine or template:

estimate how long would it run?
what kind of activities is relevant and allow you to switch back to the job/coding/learning without much distraction.

Actually, I am waiting to train now. Thus, one of my strategy is to spend time in the forum. I hope this help. Learning together.

I’ll add more from the following post:
https://www.quora.com/How-do-neural-network-researchers-deal-with-long-training-times-What-do-you-do-while-waiting