Fastai_v1, adding features

PR sent. It’s a quick fix in case you want to patch it yourself locally for now:

Per class metrics and multi-label metrics

Often, in a classification problem, papers will provide a table showing metrics by class, it’s useful to compare results to industry benchmarks and can also help gain insights on which class is underperforming.
Is there a way to do this already? If not I’m willing to contribute a PR

Similarly, for multi-label problems, it would be great to be able to compute the traditional metrics directly (Precision, recall, FBeta, etc.), most metrics in the library currently do not work for multi-label scenarios.
Eventually I would like to combine these 2 features (multi-label, per class metrics).

@herrmann Have you found any particular impact on accuracy when using the sine and cosine parts of modular components of date/time cycles?

What exactly is the underlying motivation?

I haven’t seen anything like this before so I am really curious to learn why that might be a good idea to do?

Hi, can I submit a PR for returning results from functions show_results instead of just displaying them?
The reason is that I want to do a callback for logging losses, metrics & results at each epoch.

Option 1 (raw data):

  • return (xs, ys, zs), whether it is text or images

Option 2 (formatted data):

  • text: return pd.DataFrame
  • images: return plt

I’m a bit more in favor of option 2 as we benefit from the formatting done by the show_results functions.

It sounds like you want something that is different from show_results, which, as its name indicates, shows things. Just copy-paste the code that is in show_results and remove the lines you don’t need .

Actually I just want to keep a handle of results displayed (it could even be just the html).

I could have a log_results function but I’ll need to keep it in sync with all the show_results (if they get updated). I would also need to copy the code from show_xyzs as it is used when the data is tabular.

Right now, it is the best workaround I found for creating a logging callback that would save sample predictions during training while still benefiting from existing functions and have a minimal amount of code (it’s just about adding a return statement in show_results and show_xyzs).

At the moment, show_results does not return anything so it would not impact any existing code that doesn’t make use of it.

Here are the changes I am suggesting: github commit

This will allow us to build upon callbacks such as CSVLogger and add sample predictions to see how they evolve during training, whether logged locally or online (which is what I am working on).

QRNN float16 support - see bfarzin’s post.

Anyone has an idea how to change relevant CUDA code to make it work?

QRNN work in FP16 since v1.0.56

1 Like

Thank you so much for a wonderful course and a great package.

A colleague and I are working on tabular data, and would like to specify particular activation functions for each layer in the model. Currently it seems that it is not possible to do in an easy way.

Would it be interesting to add this and would you accept a pull request, if I created it?

Can we make the prediction tasks threadsave? For example this one:
If I run that function to many times at the same time, the function throws internal errors.
I do not know enough about how it works to know if it is possible to make it static or if there is another way to make it threadsave.
The Use-Case: Use the classifiers on a Server taking multiple requests.

Add a categorical feature numericalizer, for feature pairty with Fastai 0.7.

In fastai 0.7 there is convenient function to process dataframes so that the output is able to be directly inputted to a model for train/test. In v1.0, the part that’s missing is to convert categorical features to numeric representations, i.e. either encode categories using category codes, or one-hot encoding (

This functionality would be a valuable feature to add to for better e2e user experience. The usage could be like:

procs = [FillMissing, Categorify, Numericalize, Normalize]
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names)

This is already done in v1, it wasn’t left out.

Could you point me to the code/doc please?

Look at process_one under TabularLine (line 47) alone with line 72

:+1: thanks. Is it possible to do one-hot encoding as well?

One hot is done automatically as well. You wind up with n+1 (for an option for if it didn’t fall into your categories)

I might be way late on this. But I don’t see why you couldn’t simply apply the very same transformation to the Y, and thus forget about inversing transforms.

Add input_mask to the Transformer.

it seems that the current Transformer implementation code can not support input_mask.
when constructing the input in Transformer encoder or Bert, we always pad the input, e.g,
batch[0] : A B C [pad] [pad] --> input_mask 1 1 1 0 0
batch[1]: D E [pad] [pad] [pad] --> input_mask 1 1 0 0 0,
in which the input_mask is applied to MultiHeadAttention to avoid accessing padding information.

Am I wrong or it is indeed not implemented in fastiai.text.model.Transformer

Also, the notebook about transformer in here does not consider the input mask neither.

Feature request: rework of TabularProcessor for easier definition of custom TabularProc s

I’m currently experimenting with integrating different models inside, and I always end up using TabularDataBunch as my entrypoint since most of the datasets I’m using are .csv files.

Here’s the catch: some of the models I’d like to train are unsupervised, so the embeddings used in the TabularModel won’t work. I tried implementing a OneHotEncode subclass to TabularProc to leverage the Tabular data block API, but no matter what I do I get errors with the existing transforms due to modified cat/cont names.

I’d like the ability to define custom processes (e.g. OneHotEncode or unsupervised embeddings) without having to implement a custom TabularProcessor, if possible.