What to do while waiting for models to train?


(James Player) #1

I notice there’s a lot of waiting around while doing deep learning.

I’m just curious what others do while they’re waiting for their training tasks to run. I often find myself just staring at the accuracy and waiting for the seconds to tick down to zero. What’s the best way to optimize this time?


(Matthijs) #2

Doing stretches. :smiley:

It gets worse when you’re training on very big datasets and the wait is hours (or days) instead of minutes.


(James Dietle) #3

I go back over the code to put in additional commenting for my understanding, clean something in the house, or as @machinethink said do some quick calisthenics.

I view it almost like a race, if there are 4 minutes of training, move the dryer clothes to the bed, move the wash to the dryer, and get back into my seat. Next time fold out some clothes. Next time put them away.

My house actually is looking cleaner these days!


(Bhabani) #4

I feel like Sheldon Cooper from The Big Bang Theory. :stuck_out_tongue:


(Pavel Surmenok) #5

If you have extra hardware you can design/run another experiment in parallel. You can also analyze results of previous experiments and design a few next things to try. Do error analysis. Of course, it’s easier to find things to do if your experiment runs for 30 minutes or a few hours/days. If the experiment takes just 4 minutes to run then there is not much you can do :slight_smile:


(dsteele) #6

+1 for laundry/cleaning up around the house.


(marc) #7

The trick is to run more than one experiment at once. But then keeping track of the code changes and hyper parameters requires extra attention.


(Pavel Surmenok) #8

To track experiment results I maintain a spreadsheet with experiment name, all hyperparameter values, results (validation loss/accuracy). Also it might be useful to keep some kind of lab journal to keep track of what insights you got after evaluating experiment results and what was the reason for running the experiment (e.g. you got an insight that dropout probability impacts validation loss a lot and you decided to test a few different values for that parameter).


(Xu Zhang) #9

Would you like to show your spreadsheet for reference? Thanks a lot


(Xu Zhang) #10

Would you like to show your spreadsheet for reference? Thanks a lot


(RobG) #11

Sleep …


(jakubczakon) #12

Hi @jamesplayer,

Jakub from neptune.ml here.
If you keep track of metrics, hyperparameters and stuff you can do meta analysis while waiting and figure out what to do next. For example you could:

  • compare hyperparameters with stuff like skopt.plots. I have written a helper that lets you convert a simple dataframe with metric and parameter values as columns into a scipy.optimize.OptimizeResult object that skopt.plots expect. You can check it out here.
  • do meta-analysis of your results and project activity over time, like this project progress visualization.

On a side note, I have just added a simple callback that lets you monitor fastai training in Neptune to our neptune-contrib library. I explain how it works in this blog post but basically, with no change to your workflow, you can track code, hyperparameters, metrics and more.

Before you ask, Neptune is now open and free for non-organizations.
Read more about it on the docs page to get a better view.