The fbeta function in FastAI, sums along dim=1, which seems to be across all examples in a batch instead of for all classes. Is this correct? I compared with scikitLearn and Tensorflow and looks like it might not be correct. I will add in further comments to show the code and the output of ScikitLearn and FastAI.
Here is an example of showing the difference between ScikitLearn and FastAI https://github.com/shivarajugowda/misc/blob/master/F1Score.ipynb
Please let me know if you want me to test any other variation or if my understanding is not correct.
Looking at the history of metrics.py file, the fbeta metrics implementation has been through a few iterations of bug fixes, this particular comment seems to acknowledge that for some reason the averaging was done over batches instead of labels(contrary to the f1 macro score definition). The averaging dimension went back and forth between 0 and 1 and between fbeta the function and Fbeta the class. With current standing of just fbeta the function and averaging along dimension 1, (which is over the batches). So I feel the fbeta metric implementation where it stands right now is incorrectly averaging over the batches instead of labels. If there is reason for averaging over batches some documentation will help.