An older thread here discusses how to manage a compound loss function (in particular, being able to separately record the parts of the loss in the Recorder). The solution described there make sense to me, but I don’t think it quite meets one need I have: I want to send results not just to the Recorder but to another manager as well. (In my case, I’m using Neptune.ml, but it just as easily be Tensorboard, etc.).
To make this grounded, suppose I have a metric callback called SomeInterestingStats
that will produce multiple metrics, and a metric-consuming callback called SendToNeptune
. Under the normal way of doing things, SendToNeptune
would pick up the list of metric names it can expect to see from the metrics_names
argument to Callback.on_train_begin
:
def on_train_begin(self, metrics_names, **kwargs):
self.metrics_names = metrics_names
....
But SomeInterestingStats
is going to create multiple metrics. To do this, it updates the Recorder’s list of metric names with a line of code that looks something like this:
self.learn.recorder.add_metric_names(self.names)
And then proceeds to append multiple results to the end of last_metrics
when it computes them.
SendToNeptune
is going to see the wrong list of metric names. It will at least miss the extra metrics, or even worse, get them mixed up (if SomeInterestingStats
isn’t the last metric callback in the list).
I haven’t implemented it yet, but I think there is a workaround: SendToNeptune
can get its list of metrics from the Recorder, and ignore what is passed in on_training_begin
. But that is kind of … ugly.
So first of all, I want to make sure I’ve got this right, and haven’t misunderstood the complex information flow through callbacks.
If I am right, then this isn’t a problem just for me, but arguably for any Callback
that thinks it can use the metrics_names
argument to on_train_begin
. Given the (published and used) technique of expanding the list of metrics, the metrics_names
argument is simply unreliable.
I think we should either:
- Add a method for to allow callbacks to declare the name(s) they will publish (and
CallbackHandler
orLearner
, rather than the callback, take responsibility for synchronizing withRecorder
) - Deprecate the
metrics_names
argument toon_train_begin
and make the workaround above the ‘official’ way to do things.