Tensorboard Callback for Fastai

(Bryan Heffernan) #1

I created a fast.ai callback that logs model and training information that can be viewed in tensorboard.

Tensorboard is a visualization tool that can help debug and explore your model. Read more about it here. Tensorboard is made for Tensorflow, but thanks to TensorboardX it also works with Pytorch.

Download the callback and an example notebook similar to lesson 5 at https://github.com/Pendar2/fastai-tensorboard-callback.

Currently this callback plots training loss, validation loss, and metrics. These plots can be viewed in Tensorboard scalars tab. More could be added in the future such as learning rate and momentum. Every X iterations a snapshot of the model’s weights are logged and can be viewed in Tensorboard histogram and distribution tab. Every epoch, the embedding layers are saved and can be viewed in 3D with dimensionality reduction, in the projector tab. Lastly, the model’s dataflow graph can be viewed in the graph tab (can be buggy with RNNs). Below are screenshots of each.

To use you must have Tensorboard and TensorboardX installed.
pip install tensorflow
pip install git+https://github.com/lanpa/tensorboard-pytorch
Graph visualization requires Pytorch >= 0.4. Fastai currently uses 0.3. I have only tested with Pytorch 0.4.

Launch the Tensorboard server with tensorboard --logdir="directory of logs file. default location is PATH/logs"
Then navigate your browser to http://localhost:6006

I made an example notebook on how to use the callback. The logs are stored at the ModelData path in the logs directory. The constructor requires a nn.Module instance, a ModelData instance, and a name for the log. The metrics_names parameter is a list of names for the fit function’s metrics. If this callback ever gets merged into fastai then these parameters (except for log name) wouldn’t be required. Modify the save path, and histogram save frequency with the path=None, histogram_freq=100 parameters.

(Arka Sadhu) #2

This is so cool. Tensorboard does have many interesting visualizations. Thanks for this. I made a similar thing i.e. using callbacks for visdom here https://github.com/TheShadow29/FAI-notes/blob/master/notebooks/Visdom-With-FastAI.ipynb.


(Fred Guth) #3

@Pendar, thanks for this code!

I am new to Tensoboard and although it seems to be working fine, I wasn’t able to see the training loss and the validation loss in the same graph. How can I do that?

(Bryan Heffernan) #4

Each line on the graph is a different run where the run name is defined when creating the callback object. I made it this way so it can be used to evaluate and compare the performance of multiple models.

The new fastai_v1 progress bar could do this for you: https://twitter.com/GuggerSylvain/status/1031109930353352705

(Mark Worrall) #5


This is wonderful and worked out the box - you hero.


Out of curiosity, does this work still with the v1 fastai library?

(Bryan Heffernan) #7

Probably not, as v1 had major callback changes. Will update this to work with v1 soon.

(Arka Sadhu) #8

It works pretty much the same with a few changes. @jamesp @Pendar
Here is my current code (checked on fastai 1.0.5)

from tensorboardX import SummaryWriter
from fastai.callback import Callback
from pathlib import Path
import shutil

class TensorboardLogger(Callback):
    A general Purpose Logger for TensorboardX
    Also save a .txt file for the important parts

    def __init__(self, learner, log_name, cfgtxt, del_existing=False, histogram_freq=100):
        Learner is the ConvLearner
        log_name: name of the log directory to be formed. Will be input
        for each run
        cfgtxt: HyperParams
        del_existing: To run the experiment from scratch and remove previous logs
        self.learn = learner
        self.model = learner.model
        self.md = learner.data

        self.metrics_names = ["validation_loss"]
        self.metrics_names += [m.__name__ for m in learner.metrics]

        self.best_met = 0

        self.histogram_freq = histogram_freq
        self.cfgtxt = cfgtxt

        path = Path(self.md.path) / "logs"
        self.log_name = log_name
        self.log_dir = path / log_name

        self.init_logs(self.log_dir, del_existing)
        self.init_txt_writer(path, log_name)

    def init_logs(self, log_dir, del_existing):
        if log_dir.exists():
            if del_existing:
                print(f'removing existing log with same name {log_dir.stem}')

    def init_tb_writer(self):
        self.writer = SummaryWriter(
            comment='main_mdl', log_dir=str(self.log_dir))
        self.writer.add_text('HyperParams', self.cfgtxt)

    def init_txt_writer(self, path, log_name):
        self.fw_ = path / f'{log_name}.txt'
        self.str_form = '{} \t {} \t '
        for m in self.metrics_names:
            self.str_form += '{} \t '
        self.str_form += '\n'
        self.out_str = self.str_form.format(
            'epoch', 'trn_loss', *self.metrics_names)

        with open(self.fw_, 'w') as f:

    def on_batch_end(self, **kwargs):
        self.trn_loss = kwargs['last_loss']
        num_batch = kwargs['num_batch']
            'trn_loss_batch', self.trn_loss, num_batch)

    def on_epoch_end(self, **kwargs):
        metrics = kwargs['last_metrics']
        epoch = kwargs['epoch']
        trn_loss = kwargs['smooth_loss']
        self.writer.add_scalar('trn_loss', trn_loss, epoch)

        for val, name in zip(metrics, self.metrics_names):
            self.writer.add_scalar(name, val, epoch)

                                             self.trn_loss, *metrics))

        m = metrics[1]
        if m > self.best_met:
            self.best_met = m

    def on_train_end(self, **kwargs):
        self.writer.add_text('Total Epochs', str(kwargs['epoch']))
        self.file_write(f'Epochs done, {kwargs["epoch"]}')

    def file_write(self, outstr):
        with open(self.fw_, 'a') as f:

And you use it with your learner function like this:

tb_callback = TensorboardLogger(
        learn, uid, json.dumps(cfg), del_existing=del_existing)
learn.callbacks = [tb_callback]

uid is just a unique identifier (name of the log), del_existing if True will delete the previous log with the same name. And cfg is a dictionary with all the hyper-parameters.

(Bryan Heffernan) #9

Updated to support fastai v1. Added lr and mom logging. Also simplified params:
learn.fit(1, 1e-3, callbacks=[TensorboardLogger(learn, "run-1")])

(Uttam) #10

@TheShadow29 . I want to plot the graphs of training & validation losses as well as accuracy through tensorboard for the ULMFiT Model . Can you help me out with the implementation part. I am not sure how to add the hyperparameters .

(Arka Sadhu) #11

I added hyper-params in a config dict. So my config dict is like cfg = {'bs': 64, 'lr': 1e-3}, then I do json.dumps(cfg) which converts it into a string, and then in tb_callback use writer.add_text('Hyp-Param', cfgtxt).