Workflow to compare & monitor models using WandbCallback

harikrishnanrajeev · July 9, 2020, 8:43am

wandb versus tensorboard, which will be the pros and cons ? . Please ignore if this question does not make any sense.

boris · July 9, 2020, 5:25pm

I don’t think there is any stupid way. Whatever works well for you is good and you’ll probably change how you do it several times!
I personally like to keep my notebook as short and concise as possible. Whenever you change any parameter (learning rate, batch size, new callback…), it should automatically be tracked with this integration so you can easily see the difference between your experiments in your project run page.

I’ve not looked into it too much but we could patch Learner to do sweeps. The only issue is that it would probably be on a limited number of parameters (batch size, learning rate, epochs…) and it may be hard to make it as flexible as the traditional sweeps. It may be sufficient based on how people use it though. Let me know your feedback

It is a great question. W&B includes the tensorboard dashboard (when used for logging) and additional features.
My favorite feature is its ability to centralize experiments and quickly compare them.
When I have a complex project, I typically try a few different ideas and pull my comparisons into W&B reports to write my reasoning along the way. Helps me think more clearly.
They have a comparison section in their documentation but I would recommend you just to do a test on a few runs as it will be easier to understand.

vrodriguezf · July 9, 2020, 6:10pm

That’s indeed a good idea! Hadn’t thought about it.

boris · July 27, 2020, 11:04pm

Just an update that Jeremy just merged the PR for handling tabular data.
Many thanks to @muellerzr for the help in that one!

Prediction table is now automatically logged with losses, metrics, etc

boris · August 5, 2020, 2:31am

With the recent update from @muellerzr, we now automatically get great details on the config parameters logged!

These are saved as string so if you play a lot with these, it may not be completely straightforward to organize your runs. As an alternative we could also extract each value such as dls.after_batch.IntToFloatTensor.div (float) but it may not completely help (Normalize.mean is tuple of 3 floats here). I’m thinking of letting this explicit description as is for now.

Let me know if you have any suggestions.

vrodriguezf · August 5, 2020, 8:41am

Thank you! I was missing this information!

boris · August 19, 2020, 5:23pm

Just a quick update I’m really excited about: the integration of artifacts in the callback.

I’ll add more doc about it but basically this is how it works:

the callback can log & track your datasets with log_dataset arg (set to True or your custom path)
you can manually log datasets with the log_dataset(path…) function (for example to split train/valid)
models are now logged as artifacts (through log_model)
you can also manually log models with log_model(…) function

boris · August 21, 2020, 7:35pm

I think WandbCallback is now ready for big show time with the new release of fastai

I added some new documentation and quick examples (also added to top post):

And here is a quick summary of the features included in the callback:

Log and compare runs and hyperparameters
Keep track of code, models and datasets
Automatically log prediction samples to visualize during training
Monitor computer resources
Make custom graphs and reports with data from your runs
Launch and scale hyperparameter search on your own compute, orchestrated by W&B
Collaborate in a transparent way, with traceability and reproducibility

Pretty excited about it! Feel free to share your feedback!

rsomani95 · October 25, 2020, 7:36am

@boris thank you so much for all your work. This callback is really cool and a delight to use.

I think even a limited use of the sweeps functionality as a patch to Learner would be 100% worth it. I personally wouldn’t write a custom script (yet) to use sweeps, so trying it out without additional effort is a huge win for users like me.

I think it makes sense for the log_model and log_dataset functions to expose description as a parameter rather than fixing it to 'trained_model' and 'raw dataset' respectively.

boris · October 28, 2020, 9:10pm

Thanks for the feedback @rsomani95 .
I made a PR for custom description.

I still have the sweeps functionality in my todo list!

boris · November 8, 2020, 12:49am

Just an update that the callback does not use log_args any longer since it was removed for efficiency reasons.

Arguments are still captured automatically through store_args.
You may notice some changes in your config parameters logged in the upcoming version.

Please feel free to give any feedback and let me know if we are missing any important parameters so we add them!

vrodriguezf · November 18, 2020, 3:01pm

I’m having now this error calling learning.fit with the WandbCallback


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-37-1c1a865cde40> in <module>
----> 1 learn.fit_one_cycle(10, lr_max=1e-3, cbs=WandbCallback(log_preds=False))
      2 learn.recorder.plot_loss()

/usr/local/lib/python3.6/dist-packages/fastai/callback/wandb.py in __init__(self, log, log_preds, log_model, log_dataset, dataset_name, valid_dl, n_preds, seed, reorder)
     26         # W&B log step
     27         self._wandb_step = wandb.run.step - 1  # -1 except if the run has previously logged data (incremented at each batch)
---> 28         self._wandb_epoch = 0 if not(wandb.run.step) else math.ceil(wandb.run.summary['epoch']) # continue to next epoch
     29         store_attr('log,log_preds,log_model,log_dataset,dataset_name,valid_dl,n_preds,seed,reorder')
     30 

/usr/local/lib/python3.6/dist-packages/wandb/sdk/wandb_summary.py in __getitem__(self, key)
     38 
     39     def __getitem__(self, key):
---> 40         item = self._as_dict()[key]
     41 
     42         if isinstance(item, dict):


KeyError: 'epoch'

Any ideas?

boris · November 18, 2020, 3:38pm

@vrodriguezf Did you log anything else in the same run?

vrodriguezf · November 18, 2020, 3:48pm

yes, and actually, removing that log solved the issue, thanks!! Why is that happening?

boris · November 18, 2020, 4:31pm

The callback also logs the epoch.
When you have items previously logged (maybe from a previous loop). It wants to make sure it continues to the next epoch, so tries to read what is the last epoch logged.
Maybe I could change the logic and not assume that there will always be an epoch logged.

There should be no issue if you do some manual logging after at least one point has been logged.

drscotthawley · November 19, 2021, 8:13am

(I realize this is a year later, but I see no other related posts on the forum.)

When trying to call learn.fit with the WandBCallback for the GANLearner.wgan, I get the error, WandbCallback was not able to prepare a DataLoader for logging prediction samples -> list index out of range.

What exactly is it that’s out of range? I’m able run run learn.show_results() with no problem.

Update: Started writing my own callback for wandb but am confused about how to get the output of the generator for “preds” instead of just the output of the critic. Here’s where I’m at so far – it doesn’t work – I’d welcome suggestions!

from PIL import Image 

class WandB_WGAN_Images(Callback):
    "Progress-like callback: log WGAN predictions to WandB"
    order = ProgressCallback.order+1
    def __init__(self, n_preds=6):
        store_attr()

    def after_epoch(self):  
        if not self.learn.training:
            with torch.no_grad():
                self.learn.switch(gen_mode=True)
                inp,preds,targs,out = self.learn.pred
                b = tuplify(inp) + tuplify(targs)
                self.dl.show_results(b, out, show=False, max_n=self.n_preds)
                preds = preds.detach().permute(1, 2, 0).cpu().squeeze().numpy() 
            images = [Image.fromarray(image) for image in preds]
            wandb.log({"examples": [wandb.Image(image) for image in images]})
            self.learn.switch(gen_mode=False)

Currently fails at the inp,preds,targs... line with ValueError: too many values to unpack (expected 4)

I see that show_results() uses “samples” and “outs” – but I can’t figure out how to obtain samples & outs while inside a callback.

drscotthawley · November 19, 2021, 10:03am

Update: Got it:

class WandB_WGAN_Images(Callback):
    "Progress-like callback: log WGAN predictions to WandB"
    order = ProgressCallback.order+1
    def __init__(self, n_preds=10):
        store_attr()

    def after_epoch(self):  
        if self.gen_mode:
            preds = learn.gan_trainer.last_gen.cpu()
            img_grid = make_grid(preds[:self.n_preds], nrow=5)
            img_grid = img_grid.permute(1, 2, 0).squeeze()
            wandb.log({"examples": wandb.Image(img_grid)})

NB: This callback should be used in fit() but not in the definition of the learner. Otherwise you’ll get an error if you call learn.show_results() after a wandb.finish().

Example: Anime Faces GAN results on WandB:

jaskolka · August 7, 2022, 9:17pm

Hi there

I am unable to get wandb to log metrics from my fastai learner no matter what I try.
currently running like so:

import wandb
from fastai.callback.wandb import *
wandb.login()
wandb.init(project_name)
learn = cnn_learner_3d(dls,resnet18_3d, metrics = accuracy, cbs=[WandbCallback(log = ‘all’, log_preds_every_epoch = True)])

metrics are calculated in my progress bar, but just won’t appear in wandb

any thoughts?

thanks in advance

tcapelle · August 26, 2022, 11:05am

Try passing project="project_name" to wandb.init. I think the first param is job_type.

I am unable to reproduce your issue, this notebooks is logging fine for me:

jaskolka · August 30, 2022, 11:26pm

thanks. you are right. works for simple example. not sure why it doesn’t work with the add on library I was using. will do more investigation and report back if I figure it out