Workflow to compare & monitor models using WandbCallback

boris · November 8, 2020, 12:49am

Just an update that the callback does not use log_args any longer since it was removed for efficiency reasons.

Arguments are still captured automatically through store_args.
You may notice some changes in your config parameters logged in the upcoming version.

Please feel free to give any feedback and let me know if we are missing any important parameters so we add them!

vrodriguezf · November 18, 2020, 3:01pm

I’m having now this error calling learning.fit with the WandbCallback


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-37-1c1a865cde40> in <module>
----> 1 learn.fit_one_cycle(10, lr_max=1e-3, cbs=WandbCallback(log_preds=False))
      2 learn.recorder.plot_loss()

/usr/local/lib/python3.6/dist-packages/fastai/callback/wandb.py in __init__(self, log, log_preds, log_model, log_dataset, dataset_name, valid_dl, n_preds, seed, reorder)
     26         # W&B log step
     27         self._wandb_step = wandb.run.step - 1  # -1 except if the run has previously logged data (incremented at each batch)
---> 28         self._wandb_epoch = 0 if not(wandb.run.step) else math.ceil(wandb.run.summary['epoch']) # continue to next epoch
     29         store_attr('log,log_preds,log_model,log_dataset,dataset_name,valid_dl,n_preds,seed,reorder')
     30 

/usr/local/lib/python3.6/dist-packages/wandb/sdk/wandb_summary.py in __getitem__(self, key)
     38 
     39     def __getitem__(self, key):
---> 40         item = self._as_dict()[key]
     41 
     42         if isinstance(item, dict):


KeyError: 'epoch'

Any ideas?

boris · November 18, 2020, 3:38pm

@vrodriguezf Did you log anything else in the same run?

vrodriguezf · November 18, 2020, 3:48pm

yes, and actually, removing that log solved the issue, thanks!! Why is that happening?

boris · November 18, 2020, 4:31pm

The callback also logs the epoch.
When you have items previously logged (maybe from a previous loop). It wants to make sure it continues to the next epoch, so tries to read what is the last epoch logged.
Maybe I could change the logic and not assume that there will always be an epoch logged.

There should be no issue if you do some manual logging after at least one point has been logged.

drscotthawley · November 19, 2021, 8:13am

(I realize this is a year later, but I see no other related posts on the forum.)

When trying to call learn.fit with the WandBCallback for the GANLearner.wgan, I get the error, WandbCallback was not able to prepare a DataLoader for logging prediction samples -> list index out of range.

What exactly is it that’s out of range? I’m able run run learn.show_results() with no problem.

Update: Started writing my own callback for wandb but am confused about how to get the output of the generator for “preds” instead of just the output of the critic. Here’s where I’m at so far – it doesn’t work – I’d welcome suggestions!

from PIL import Image 

class WandB_WGAN_Images(Callback):
    "Progress-like callback: log WGAN predictions to WandB"
    order = ProgressCallback.order+1
    def __init__(self, n_preds=6):
        store_attr()

    def after_epoch(self):  
        if not self.learn.training:
            with torch.no_grad():
                self.learn.switch(gen_mode=True)
                inp,preds,targs,out = self.learn.pred
                b = tuplify(inp) + tuplify(targs)
                self.dl.show_results(b, out, show=False, max_n=self.n_preds)
                preds = preds.detach().permute(1, 2, 0).cpu().squeeze().numpy() 
            images = [Image.fromarray(image) for image in preds]
            wandb.log({"examples": [wandb.Image(image) for image in images]})
            self.learn.switch(gen_mode=False)

Currently fails at the inp,preds,targs... line with ValueError: too many values to unpack (expected 4)

I see that show_results() uses “samples” and “outs” – but I can’t figure out how to obtain samples & outs while inside a callback.

drscotthawley · November 19, 2021, 10:03am

Update: Got it:

class WandB_WGAN_Images(Callback):
    "Progress-like callback: log WGAN predictions to WandB"
    order = ProgressCallback.order+1
    def __init__(self, n_preds=10):
        store_attr()

    def after_epoch(self):  
        if self.gen_mode:
            preds = learn.gan_trainer.last_gen.cpu()
            img_grid = make_grid(preds[:self.n_preds], nrow=5)
            img_grid = img_grid.permute(1, 2, 0).squeeze()
            wandb.log({"examples": wandb.Image(img_grid)})

NB: This callback should be used in fit() but not in the definition of the learner. Otherwise you’ll get an error if you call learn.show_results() after a wandb.finish().

Example: Anime Faces GAN results on WandB:

jaskolka · August 7, 2022, 9:17pm

Hi there

I am unable to get wandb to log metrics from my fastai learner no matter what I try.
currently running like so:

import wandb
from fastai.callback.wandb import *
wandb.login()
wandb.init(project_name)
learn = cnn_learner_3d(dls,resnet18_3d, metrics = accuracy, cbs=[WandbCallback(log = ‘all’, log_preds_every_epoch = True)])

metrics are calculated in my progress bar, but just won’t appear in wandb

any thoughts?

thanks in advance

tcapelle · August 26, 2022, 11:05am

Try passing project="project_name" to wandb.init. I think the first param is job_type.

I am unable to reproduce your issue, this notebooks is logging fine for me:

jaskolka · August 30, 2022, 11:26pm

thanks. you are right. works for simple example. not sure why it doesn’t work with the add on library I was using. will do more investigation and report back if I figure it out