It would be nice if
learn.fit_one_cycle return a dataframe with the results for logging and Tensorboard like visualizations
It would be nice if
Currently this could be achieved in a hacky way by redirecting
import io from contextlib import redirect_stdout with io.StringIO() as buf, redirect_stdout(buf): learn.fit_one_cycle(3, slice(lr)) results = buf.getvalue() df = pd.read_csv(io.StringIO(results), sep="\s+")
but we miss the progress completely.
I have used this often. The only problem with it is if training is interrupted, it will not save. Otherwise, it works perfectly.
For interrupted training, CSVLogger probably won’t work, as @ilovescience aforementioned.
Instead, one may use notebook magic
%%capture with console logging version of fastprogress.
These are great ideas. But I think saving and restoring stats from a file isn’t optimal due to unnecessary IO. Something like Keras fit() history is quite helpful in distributed training as we need not worry about unique history file names
There is also the TensorBoard callback using file I/O like the TensorFlow version.
Avoid using files is possible, just like what you or
%%capture do. In my opinion this is more about doing it internally as callbacks or externally as decorators.
I am still unsure why CSVLogger is not good enough?
The fit() history is available in the
recorder attribute of the learner, IIRC.
Yup, it has all the pieces needed - train loss per batch, valid loss per epoch, accuracy per epoch. Is there any method that stitches all this data and returns a dictionary / data frame, especially for training loss? Something like the table that gets printed during
fit() - https://docs.fast.ai/basic_train.html#Recorder.plot_losses
Still not sure why the CSV callback will not work. Or even if it does not, using the source code for the callback and adapting it for your own use-case could be helpful.
It incurs additional IO of writing / reading to / from disk and puts the burden on the user to collect the right metrics in distributed training environments where multiple learners are working in parallel sharing the same disk. My thought is it would be nice if this complexity could be handled by the library itself like Keras does and simply return the metrics and losses on
Hmm I am not sure that is true. CSVLogger is simply taking the information from the Learner.Recorder object and saving it into a CSV IIRC.