Obtain `learn.fit` results as a dataframe

It would be nice if learn.fit or learn.fit_one_cycle return a dataframe with the results for logging and Tensorboard like visualizations

Currently this could be achieved in a hacky way by redirecting stdout,

import io
from contextlib import redirect_stdout

with io.StringIO() as buf, redirect_stdout(buf):
    learn.fit_one_cycle(3, slice(lr))
    results = buf.getvalue()
    df = pd.read_csv(io.StringIO(results), sep="\s+")

but we miss the progress completely.

If there is something special you want to do with fastai, a good idea is to check if there’s a callback for it over here. Indeed there is

I have used this often. The only problem with it is if training is interrupted, it will not save. Otherwise, it works perfectly.

3 Likes

For interrupted training, CSVLogger probably won’t work, as @ilovescience aforementioned.
Instead, one may use notebook magic %%capture with console logging version of fastprogress.

These are great ideas. But I think saving and restoring stats from a file isn’t optimal due to unnecessary IO. Something like Keras fit() history is quite helpful in distributed training as we need not worry about unique history file names

There is also the TensorBoard callback using file I/O like the TensorFlow version.
Avoid using files is possible, just like what you or %%capture do. In my opinion this is more about doing it internally as callbacks or externally as decorators.

I am still unsure why CSVLogger is not good enough?

The fit() history is available in the recorder attribute of the learner, IIRC.

Yup, it has all the pieces needed - train loss per batch, valid loss per epoch, accuracy per epoch. Is there any method that stitches all this data and returns a dictionary / data frame, especially for training loss? Something like the table that gets printed during fit() - https://docs.fast.ai/basic_train.html#Recorder.plot_losses

Still not sure why the CSV callback will not work. Or even if it does not, using the source code for the callback and adapting it for your own use-case could be helpful.

It incurs additional IO of writing / reading to / from disk and puts the burden on the user to collect the right metrics in distributed training environments where multiple learners are working in parallel sharing the same disk. My thought is it would be nice if this complexity could be handled by the library itself like Keras does and simply return the metrics and losses on learn.fit() call.

Hmm I am not sure that is true. CSVLogger is simply taking the information from the Learner.Recorder object and saving it into a CSV IIRC.