Fastai_v1, adding features

sgugger · April 8, 2019, 3:02pm

It sounds like you want something that is different from show_results, which, as its name indicates, shows things. Just copy-paste the code that is in show_results and remove the lines you don’t need .

boris · April 8, 2019, 3:14pm

Actually I just want to keep a handle of results displayed (it could even be just the html).

I could have a log_results function but I’ll need to keep it in sync with all the show_results (if they get updated). I would also need to copy the code from show_xyzs as it is used when the data is tabular.

Right now, it is the best workaround I found for creating a logging callback that would save sample predictions during training while still benefiting from existing functions and have a minimal amount of code (it’s just about adding a return statement in show_results and show_xyzs).

At the moment, show_results does not return anything so it would not impact any existing code that doesn’t make use of it.

boris · April 8, 2019, 5:01pm

Here are the changes I am suggesting: github commit

This will allow us to build upon callbacks such as CSVLogger and add sample predictions to see how they evolve during training, whether logged locally or online (which is what I am working on).

lambdaofgod · August 14, 2019, 7:49am

QRNN float16 support - see bfarzin’s post.

Anyone has an idea how to change relevant CUDA code to make it work?

sgugger · August 14, 2019, 9:29am

QRNN work in FP16 since v1.0.56

pjarnhus · August 22, 2019, 10:35am

Thank you so much for a wonderful course and a great package.

A colleague and I are working on tabular data, and would like to specify particular activation functions for each layer in the model. Currently it seems that it is not possible to do in an easy way.

Would it be interesting to add this and would you accept a pull request, if I created it?

telion · August 23, 2019, 8:23am

Can we make the prediction tasks threadsave? For example this one:
https://docs.fast.ai/text.learner.html#LanguageLearner.predict
If I run that function to many times at the same time, the function throws internal errors.
I do not know enough about how it works to know if it is possible to make it static or if there is another way to make it threadsave.
The Use-Case: Use the classifiers on a Server taking multiple requests.

Firenze11 · September 2, 2019, 1:31am

Add a categorical feature numericalizer, for feature pairty with Fastai 0.7.

In fastai 0.7 there is convenient function to process dataframes so that the output is able to be directly inputted to a model for train/test. In v1.0, the part that’s missing is to convert categorical features to numeric representations, i.e. either encode categories using category codes, or one-hot encoding (https://github.com/fastai/fastai/blob/0.7.0/fastai/structured.py#L438).

This functionality would be a valuable feature to add to for better e2e user experience. The usage could be like:

procs = [FillMissing, Categorify, Numericalize, Normalize]
data = TabularDataBunch.from_df(path, df, dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_names)

muellerzr · September 2, 2019, 12:47pm

This is already done in v1, it wasn’t left out.

Firenze11 · September 2, 2019, 8:15pm

Could you point me to the code/doc please?

muellerzr · September 2, 2019, 8:23pm

github.com

fastai/fastai/blob/master/fastai/tabular/data.py

"Data loading pipeline for structured data support. Loads from pandas DataFrame"
from ..torch_core import *
from .transform import *
from ..basic_data import *
from ..data_block import *
from ..basic_train import *
from .models import *
from pandas.api.types import is_numeric_dtype, is_categorical_dtype

__all__ = ['TabularDataBunch', 'TabularLine', 'TabularList', 'TabularProcessor', 'tabular_learner']

OptTabTfms = Optional[Collection[TabularProc]]

#def emb_sz_rule(n_cat:int)->int: return min(50, (n_cat//2)+1)
def emb_sz_rule(n_cat:int)->int: return min(600, round(1.6 * n_cat**0.56))

def def_emb_sz(classes, n, sz_dict=None):
    "Pick an embedding size for `n` depending on `classes` if not given in `sz_dict`."
    sz_dict = ifnone(sz_dict, {})
    n_cat = len(classes[n])

This file has been truncated. show original

Look at process_one under TabularLine (line 47) alone with line 72

Firenze11 · September 2, 2019, 9:25pm

thanks. Is it possible to do one-hot encoding as well?

muellerzr · September 2, 2019, 9:26pm

One hot is done automatically as well. You wind up with n+1 (for an option for if it didn’t fall into your categories)

ignasimg · January 9, 2020, 2:10pm

I might be way late on this. But I don’t see why you couldn’t simply apply the very same transformation to the Y, and thus forget about inversing transforms.

Founty · April 15, 2020, 3:58am

Add input_mask to the Transformer.

it seems that the current Transformer implementation code can not support input_mask.
when constructing the input in Transformer encoder or Bert, we always pad the input, e.g,
batch[0] : A B C [pad] [pad] --> input_mask 1 1 1 0 0
batch[1]: D E [pad] [pad] [pad] --> input_mask 1 1 0 0 0,
in which the input_mask is applied to MultiHeadAttention to avoid accessing padding information.

Am I wrong or it is indeed not implemented in fastiai.text.model.Transformer

Also, the notebook about transformer in here does not consider the input mask neither.

rsayn · May 13, 2020, 11:57am

Feature request: rework of TabularProcessor for easier definition of custom TabularProc s

Description
I’m currently experimenting with integrating different models inside Fast.ai, and I always end up using TabularDataBunch as my entrypoint since most of the datasets I’m using are .csv files.

Here’s the catch: some of the models I’d like to train are unsupervised, so the embeddings used in the TabularModel won’t work. I tried implementing a OneHotEncode subclass to TabularProc to leverage the Tabular data block API, but no matter what I do I get errors with the existing transforms due to modified cat/cont names.

I’d like the ability to define custom processes (e.g. OneHotEncode or unsupervised embeddings) without having to implement a custom TabularProcessor, if possible.