Workflow to compare & monitor models using WandbCallback

boris · June 11, 2020, 5:26am

I guess the main issue is that it didn’t show prediction samples (loss should still be logged as well as hyper-parameters).

I’m currently working on supporting bounding boxes and can take a look at tabular right after. I definitely want to support most of common types.
We can actually log tables so it’s just a matter of preparing correctly the DataLoader, getting prediction samples and decide what we want to log and how to present it.

muellerzr · June 11, 2020, 5:31am

Awesome! That’d be great! And thanks for the quick answer

hallvagi · June 19, 2020, 10:28am

Wandb seems like a great tool! I’ve been testing it with a few simple examples, such as the demo notebook mentioned above. I’m able to track the hyperparameters (lr, loss etc), but I can’t seem to track the parameters nor the gradients. I’ve been testing various options to the log parameter in the callback, but no results so far. Any ideas of what I’m doing wrong?

hallvagi · June 19, 2020, 11:39am

D’oh! There is a default minimum of 100 batches to be run before parameters and gradients are stored: wandb.watch??. Datasets such as MNIST tiny has fewer batches (with normal batch sizes at least).

hallvagi · June 19, 2020, 12:11pm

Also, if I run two experiments in a row like this:

#first experiment:
wandb.init(project='imagnette-128')
learn = cnn_learner(dls, resnet18, pretrained=False)
learn.fit(3, 1e-3, cbs=WandbCallback(log='all'))

#Second experiment:
wandb.init(project='imagnette-128')
learn = cnn_learner(dls, resnet34, pretrained=False)
learn.fit(3, 1e-3, cbs=WandbCallback(log='all'))

Parameters and gradients are stored only for the first one. Any ideas why that is?

boris · July 8, 2020, 5:36pm

It is because WandbCallback._wandb_watch_called == True as watch can be called only once per run and once per model.
Technically it could be reset to False when doing a new wandb.init. If it becomes a common issue we could monkey patch it.
You can set it manually to False for now. Let me know if that works.

boris · July 8, 2020, 7:05pm

I’m currently working on it.

The issue is related to creating a DataLoader for sample predictions:

we choose 36 items from the validation set → seems to work fine
issue happens at this line when trying to create a DataLoader from those items

I think the problem is that the items fed to TabularDataLoaders.test_dl is a list of pandas.core.series.Series while it should probably be a Dataframe.

When I convert it to a Dataframe with test_items = pd.DataFrame(test_items) prior to feeding it to test_dl, I get an error while fetching prediction samples: see stack trace.

Not sure where the problem is…
I created an experimental notebook if anyone wants to try to tackle this issue.

muellerzr · July 8, 2020, 7:12pm

Why not just do:

test_items = self.dls.valid_ds.items.iloc[idxs]
self.valid_dl = self.dls.test_dl(test_items)

?

Otherwise this worked too:

test_df = getattr(dls.valid_ds.items, 'iloc')[idxs]

hallvagi · July 9, 2020, 7:22am

Resetting WandbCallback._wandb_watch_called = False worked just fine!

But I’m thinking that maybe this is kind of a stupid way of organizing experiments? Previously I’ve often used the notebook itself to “store” experiments - say, testing a few different lrs in sequential cells, but this quickly becomes a mess. So I guess it’s much cleaner to keep the notebook concise, and rather store the results after running the notebook several times over? And if you want to systematically test various parameters, just go for a parameter sweep. Curious to hear other thoughts on workflow!

vrodriguezf · July 9, 2020, 7:31am

Wandb sweeps are great, but still, I find a little bit of friction from passing from a single experiment in a notebook to defining a training function for the wandb agent. Maybe calling the notebook itself as a function with papermill and put that into the sweep could do the trick, but I haven’t tried that out yet

harikrishnanrajeev · July 9, 2020, 8:43am

wandb versus tensorboard, which will be the pros and cons ? . Please ignore if this question does not make any sense.

boris · July 9, 2020, 5:25pm

I don’t think there is any stupid way. Whatever works well for you is good and you’ll probably change how you do it several times!
I personally like to keep my notebook as short and concise as possible. Whenever you change any parameter (learning rate, batch size, new callback…), it should automatically be tracked with this integration so you can easily see the difference between your experiments in your project run page.

I’ve not looked into it too much but we could patch Learner to do sweeps. The only issue is that it would probably be on a limited number of parameters (batch size, learning rate, epochs…) and it may be hard to make it as flexible as the traditional sweeps. It may be sufficient based on how people use it though. Let me know your feedback

It is a great question. W&B includes the tensorboard dashboard (when used for logging) and additional features.
My favorite feature is its ability to centralize experiments and quickly compare them.
When I have a complex project, I typically try a few different ideas and pull my comparisons into W&B reports to write my reasoning along the way. Helps me think more clearly.
They have a comparison section in their documentation but I would recommend you just to do a test on a few runs as it will be easier to understand.

vrodriguezf · July 9, 2020, 6:10pm

That’s indeed a good idea! Hadn’t thought about it.

boris · July 27, 2020, 11:04pm

Just an update that Jeremy just merged the PR for handling tabular data.
Many thanks to @muellerzr for the help in that one!

Prediction table is now automatically logged with losses, metrics, etc

boris · August 5, 2020, 2:31am

With the recent update from @muellerzr, we now automatically get great details on the config parameters logged!

These are saved as string so if you play a lot with these, it may not be completely straightforward to organize your runs. As an alternative we could also extract each value such as dls.after_batch.IntToFloatTensor.div (float) but it may not completely help (Normalize.mean is tuple of 3 floats here). I’m thinking of letting this explicit description as is for now.

Let me know if you have any suggestions.

vrodriguezf · August 5, 2020, 8:41am

Thank you! I was missing this information!

boris · August 19, 2020, 5:23pm

Just a quick update I’m really excited about: the integration of artifacts in the callback.

I’ll add more doc about it but basically this is how it works:

the callback can log & track your datasets with log_dataset arg (set to True or your custom path)
you can manually log datasets with the log_dataset(path…) function (for example to split train/valid)
models are now logged as artifacts (through log_model)
you can also manually log models with log_model(…) function

boris · August 21, 2020, 7:35pm

I think WandbCallback is now ready for big show time with the new release of fastai

I added some new documentation and quick examples (also added to top post):

And here is a quick summary of the features included in the callback:

Log and compare runs and hyperparameters
Keep track of code, models and datasets
Automatically log prediction samples to visualize during training
Monitor computer resources
Make custom graphs and reports with data from your runs
Launch and scale hyperparameter search on your own compute, orchestrated by W&B
Collaborate in a transparent way, with traceability and reproducibility

Pretty excited about it! Feel free to share your feedback!

rsomani95 · October 25, 2020, 7:36am

@boris thank you so much for all your work. This callback is really cool and a delight to use.

I think even a limited use of the sweeps functionality as a patch to Learner would be 100% worth it. I personally wouldn’t write a custom script (yet) to use sweeps, so trying it out without additional effort is a huge win for users like me.

I think it makes sense for the log_model and log_dataset functions to expose description as a parameter rather than fixing it to 'trained_model' and 'raw dataset' respectively.

boris · October 28, 2020, 9:10pm

Thanks for the feedback @rsomani95 .
I made a PR for custom description.

I still have the sweeps functionality in my todo list!