CSV Logger Callback?

Does fast.ai library have something similar to CSVLogger from keras? I mean, a callback that saves metrics history into text file during the training process? Or probably it is on the list to introduce such a callback?

1 Like

There isn’t, but we’d be happy to have such a contribution! :slight_smile: See CONTRIBUTING.md if you’re interested.

Ok, got it! Sure, I could make a try to come up with something =)
Will post a link to the repo as soon as get familiar with CONTRIBUTING.md and have a basic prototype.

I’ve started working on PR to support CSV logs:
https://github.com/devforfu/fastai_csvlogger/blob/master/main.py

It is not yet compliant with contribution guide but I’ll prepare everything required as soon logging callback is ready.

1 Like

Not sure if I’ve done everything correctly. However, I think that the basic version is ready for a review.

Here is a link to the repo:

Here is a link to the Gist:

Could you tell me please, what should be done next? Do I need to write a documentation entry, more tests, etc? I was writing tests in a separate file because it allows using pytest test runner and its fixtures, not sure if this possible from the notebook.

My apologies if the PR is in the wrong format, or if something is missing. I’ve never committed changes via notebooks before.

4 Likes

Wow that’s great work! The only tiny thing I can think of is to somehow save the stats in one of your test to then check that what was written in the csv file.

Oh and another little thing, the file should be saved at self.learn.path/f’{filename}.csv’, then we change the default to just ‘history’.

In terms of refactoring, is your write_stats function the exact same as the Recorder? If so, subclass Recorder instead of LearnerCallback to avoid unnecessary code.

But thanks a lot for such a complete proposal! And you were right to put the tests in the notebook like this, it makes it very easy to read.

1 Like

This really is great work! :slight_smile: Thanks @devforfu . I’ve added a section on writing tests:

https://docs-dev.fast.ai/test.html#writing-tests

I’m highly allergic to mocks. Could you try to make something that uses the real classes and functions? You can use simple_cnn to create a minimal model (no need to use rn18). It might take a bit of effort to build up all the pieces you need for this test, but once you do, we can factor them out into a test-helpers class that everyone can use.

1 Like

@sgugger @jeremy Thank you for your appreciation!

I am glad to address the issues you’ve mentioned as soon as possible. Not a problem, I’ll adjust the class as required, and remove the fit method mock =) Totally agree, I believe that tests’​ codebase should be supported and developed not less carefully than functional code so the more reusable tools the better.

2 Likes

I’ve made an attempt to address the issues you’ve mentioned. Now there are no mocks, and filename property fixed. I am using simple_cnn instead of resnet18. Also, I’ve removed one of the tests because think that it didn’t really bring anything to the table as soon as I haven’t modified training methods. And created a new one that compares the output from the file with stdout.

Here is a new version of the notebook:

However, there are a couple of questions:

  1. As I can see, Recorder class has a format_stats but it doesn’t return anything and sends formatted strings right into a progress bar. Would you like to inherit from Recorder and override the method?
  2. I am using PyCharm IDE, and somehow it raises a KeyboardInterrupt exception when I invoke pytest test suits with debugger enabled. The exception is raised on fit method call. Have you ever seen similar problems? I guess that the problem is related to PyCharm’s debugger. However, I’ve never seen this issue previously. Could you advise something in this case? (Except picking another IDE :smile:)

Please let me know if you think that anything else should be improved/changed.

Thanks!
I guess I’ll refactor that format_stats method to isolate the writing once we move this into master. Tests sound fine but why the no_bar?

As for the exception, I have no idea since I don’t use PyCharm. The way I do tests is just to type pytest in the fastai repo.

1 Like

Sure, not a problem! Ok, great!

The reason why I am using no_bar is that I am capturing reporter’s stdout stream’s output to compare it with a content of CSV file, and progress bar clutters the output with a bunch of symbols like \r and elapsed time. Probably there is a more simple way to access the recorded metrics? Like, use properties of Recorder? I just not yet fully familiar with the framework’s codebase to use it efficiently :smile:

Yes, agree, with CLI everything works fine.

Recorded metrics are in recorder.metrics, validation losses in recorder.val_losses and losses in recorder.losses (though you have all of them there, so you have to pick the ones that are at the end of each batch). :wink:

2 Likes

Eventually, here is a new version of tests:


I’ve added an additional test that uses Recorder to get measured metrics:

def test_callback_written_metrics_are_equal_to_values_stored_in_reporter(classifier_and_logger):
    n_epochs = 3
    classifier, cb = classifier_and_logger

    classifier.fit(n_epochs, callbacks=[cb])

    csv_df = cb.read_logged_file()
    recorder_df = create_metrics_dataframe(classifier)
    pd.testing.assert_frame_equal(csv_df, recorder_df)

I am keeping the old test also, that captures stdout. However, I guess you could drop it if you would like, if there is no sense to check that stdout content is equal to CSV content.

Please let me know if any other improvement is required.


As a side note, am I right that I need re-generate Gist from notebook each time when I modify its content?

1 Like

For your last question: no. You can pass the id of your current gist to update it.

This is all looking pretty good and almost ready to be joined to the library. For the tests requiring training, I might pick just one and couple it with our integration test because we don’t want too many training tests (those take a long time on CPU). The other would be marked as slow and only run locally before doing a release or an important PR.
Thanks a lot for all your work on this!

Sure, not a problem :smile:

Ok, got it! Could you please share a reference to the folder with integration tests? (I see folder fastai/tests but not sure where is an appropriate file).

Would you like me to mark all training tests except one with pytest.mark.slow? Or would you prefer (I guess it is the way how PRs with new features are integrated into the main codebase) to finalize things as appropriate yourself?

This is now incorporated in the library, thanks for your contribution!
I wanted to refactor a bit, so I integrated this myself. In particular, I put your tests in our integration test of training on vision, so that it doesn’t slow anything down.

2 Likes

That’s great, thank you for finalizing this thing into something ready for use​ within the library!

5 Likes

Trying use CSVLogger, as in docs -
ver 1.028,
callback_fns=[ShowGraph, CSVLogger]
but have error - ‘CSVLogger’ is not defined
Am i missing something?

PS Sorry, find answer -
you have to import callback:
from fastai.callbacks import CSVLogger

3 Likes

May be good idea open history file with buffering=0 or do flush() on epoch end?
So we can access to data from another process before end of fitting and in case of crash.

1 Like

Feel free to send a PR with that change - sounds reasonable to me.

2 Likes