Tensorboard Integration

(Jason Antic) #1

Hello everyone-

I discussed this with @jeremy and @sgugger already. But the gist is I’m planning on submitting a pull request to add built-in TensorboardX functionality. There’s still a lot more functionality that could be built on top of this, but here’s what I intend on putting up:

  • Image generation visualization (for both GAN and non GAN learners)
  • Model histograms/distributions for each of the layers
  • Various gradient stats
  • All available metrics/losses (losses are reported as a ‘metric’ currently).

The single file I plan on putting up is here:


The usage is as follows (example directly from DeOldify):

proj_id = 'Colorize'
tboard_path = Path('data/tensorboard/' + proj_id)
learn.callback_fns.append(partial(GANTensorboardWriter, base_dir=tboard_path, name='GanLearner'))

The approach Jeremy suggested was that I’d just submit the single tensorboard.py file in callbacks without adding dependencies to install files (which would be tensorboard and tensorboardx). He’d add the logic from there to handle the case where the the (still optional) prerequisites aren’t installed- logic that would basically inform the user that if they want to use these callbacks, they’ll have to install these additional dependencies.

I tried to tackle the performance issues and basically that amounted to putting blocking i/o operations into a simple request/queue daemon thread based writer (AsyncTBWriter). This shouldn’t actually be necessary on our part though- it should be handled on the TensorboardX end. So I plan on digging further on that and raising the issue in that project. Anyway my Python isn’t all that great yet so there’s a good chance there’s a better way than what I did there. Just let me know- it won’t hurt my feelings :slight_smile:

A few other things to scrutinize would be GPU to CPU logic on the tensors, the way I’m handling getting and caching batches from one_batch calls (which is slow with ImageNet at least), and the defaults I set for how often these things get written (stats_iters, hist_iters, etc). I basically did what worked for me but this isn’t battle tested for everything. I’ve been running this for a while as is and I haven’t had any noticeable issues.

Anyway, I really do recommend using Tensorboard, in particular for image generation and the model histograms. It’s enormously helpful to see the transitions with the image sliders to see the subtle changes in the images.

And some puzzling bugs in the model can be readily exposed with the histogram graphs- this happened recently when I found a bug in the new fastai SelfAttention module a few weeks ago, where it wasn’t actually learning. This was obvious in the graphs- gamma remained 0.

I’m new to the whole pull request process so just let me know if I screwed anything up. And i’m certainly willing to put up documentation. Also- the pull request instructions for new features suggest adding tests but I’m honestly not sure what that would consist of in this case (and I’m a guy who is really into testing). Any pointers?

I’ll formally submit the pull request once I get the green-light here.


Lesson 9 Discussion & Wiki (2019)
Lesson 9 Discussion & Wiki (2019)
(Michael) #2

This would be great!

I also played with it around some time ago.

Maybe the new tensorboard(X) notebook and slides/video from the CMU DL course could be interesting for you.

Keep up the great work! :slight_smile:


(Michael) #3

Hello @jsa169,

I just saw the first parts of the tensor board callback on GitHub.
If I can help you (with my limited skills) just tell me. - I would be happy to learn and contribute! :slight_smile:

Kind regards

1 Like

(Jason Antic) #4

Thanks @MicPie! So I can tell you this much- there’s probably still a lot of functionality that could be added, for one. Like…I wasn’t sure what all stats would be relevant or useful, so I just did what worked for me at the time. Should be easy to add from here. So simply having your second eye look at it in terms of that would help.

There’s also other types of models not covered yet- audio and text generation, for example. But I know Tensorboard has support for these.

Also- I’m just not sure if I did everything 100% legit. So that definitely needs to be scrutinized. I must have screwed something up. That’s a given.

1 Like

(Stefan) #5

@jsa169 great work. I wasn’t aware of the performance bottlenecks, but I mostly used the scalar logger of tensorboard(X).

I think that self.metrics_root = '/metrics/' in LearnerTensorboardWriter should be self.metrics_root = 'metrics/', otherwise you get warnings like this:
Summary name /metrics/valid_loss is illegal; using metrics/valid_loss instead.

1 Like

(Jason Antic) #6

Thanks for pointing that out! I’m surprised I didn’t notice that. I’ll put it on my todo list.


(Etienne Tremblay) #7

Really great work! Love the visualizations. Do you think it would be possible to merge some metrics for training and validation in the same graph with different colors like so:

1 Like

(Jason Antic) #8

Thanks! I’d definitely like to have this functionality too. Unfortunately I just personally don’t have to the time to look into it yet. Others (you included!) are certainly encouraged to do that.


(Thomas Chambon) #9

Tensorboard is now natively supported on Pytorch 1.1.
So no need to use tensorboardX anymore :slight_smile: !

1 Like

(Michael) #10

Indeed: https://pytorch.org/docs/stable/tensorboard.html
Very nice!


(Thomas Chambon) #11

I have created a repository to show how to use Tensorboard in fastai:

With the great callback system it’s very easy!