TabularLearner export.pkl from learn.export() is Very Large

jasonho28 · October 31, 2020, 12:52am

I’m not sure if this is intended but the export.pkl is about 471 MB which is somewhat prohibitive in the deployment in certain applications.

The model itself from SaveModelCallback is only 131 KB and I’m only looking to use the Learner in order to apply the same transforms/processing (Normalization, FillMissing, Categorify).

Is there a reason this is so large? I’ve also confirmed

learn.xb
(None, )

learn.yb
(None, )

orendar · November 1, 2020, 10:22am

Hey Jason,

When you export a model (without the optimizer state), you basically need to save all the weights to disk. You can do a quick ballpark estimation of the expected file size depending on the number of model parameters (and assuming float32 = 4 bytes for each), but it’s likely to be several hundred megabytes.

Saying that it’s “prohibitive” to deployment in certain applications may be true for your use case, but that means that you likely cannot use neural networks at all (or you have to use specific architectures designed to be as light-weight as possible, which usually also impacts accuracy). Another option is to see what exactly prevents you from being able to deploy this model and trying to solve that problem.

muellerzr · November 1, 2020, 12:38pm

There are options available right now @orendar and @jasonho28 what I would recommend is doing a torch.save() to save the weights and export the TabularPandas object instead. I would expect this could reduce the size. As a result during inference you’d go TP -> DataLoader rather than just a plain DL:

See the very bottom for a usage example, the library is wwf

pierreguillou · November 1, 2020, 3:27pm

Hi.

This topic looks like this one that was not solved :

The size of the pkl file created by learn.export() depends on the batch size (at least in my test, it depends on large batch size).

jasonho28 · November 1, 2020, 11:50pm

@muellerzr

The solution worked great. For reference:

We manually save the Model in the Learner
torch.save(learn.model, f'{model_dir}/2_{REF}_LEARNER_MODEL.pt')

We export the Tabular Object as well
to.export(f'{model_dir}/3_{REF}_TABULAR_OBJECT.pkl')

We load the Tabular Object

to_new = load_pandas(f'{model_dir}/3_{REF}_TABULAR_OBJECT.pkl')
to_new = to_new.train.new(df[:20])
to_new.process()

We load the Model

model_2 = torch.load(f'{model_dir}/2_{REF}_LEARNER_MODEL.pt')
learn_new = TabularLearner(dls_new, model)

We do Inference

row, clas, probs = learn_new.predict(df.iloc[0])
row.show()
probs

The savings are substantial:
Model: 135 kb
Tabular Object: 6 kb

vs.

learn.export() 417 mb

muellerzr · November 7, 2020, 7:06pm

@orendar @jasonho28 @pierreguillou over the weekend Jeremy and I solved this issue, it was due to log_args plus some extraneous references inside of ReadTabBatch. Happy to report that my export.pkl is a calm cool 142.5 kb’s

orendar · November 8, 2020, 10:21pm

I’m curious as to how this is even possible, as I don’t use tabular much - how many parameters are in the model? I don’t think I’ve ever seen a model weigh less than 100 mb on any of the libraries I’ve used.

muellerzr · November 8, 2020, 10:35pm

The tabular model is only 3(ish) fully connected layers and some embeddings. It’s a very very tiny model

A param count is almost 30k

The exact model from the adult sample example is 128kb as you can see here:

Resnet’s aren’t bad either, the 34 is still under 100mb (84mb) and the 50 is 94mb

orendar · November 9, 2020, 8:39am

That’s amazing indeed!

Yasunori · December 4, 2020, 4:37am

How can I use to.export()?

!pip install wwf

gave me

Successfully installed wwf-0.0.5.

But then,

from wwf.tabular.export import *

resulted in

ModuleNotFoundError: No module named ‘wwf.tabular’.

The cited page shows

Site last generated: Oct 22, 2020

and

wwf: 0.0.4.

muellerzr · December 4, 2020, 10:09am

That’s a typo. Should be from wwf.tab.export import *

Yasunori · December 7, 2020, 3:26am

Thanks, @muellerzr. Now, I could use to.export().

bikashg · December 20, 2020, 6:35pm

What is dls_new in this example?

bikashg · December 26, 2020, 7:30pm

More specifically, how can dls_new be obtained from the to_new to be used as a test dataset?

Haotong · January 4, 2021, 2:13am

Hi @muellerzr

fastai 2.2.0 still has the data leakage issue.
My naive screening showed similar patterns as you previously mentioned in [ReadTabBatch] (https://github.com/fastai/fastai/pull/2948). Many tabular transform are storing the intermediate data and was accidentally exported.

I reproduced the issue in:

You could see we can access the data by doing

learn_loaded.dls.loaders[0].procs.categorify.to.items

or

learn_loaded.dls.loaders[0].procs.normalize.to.items

muellerzr · January 4, 2021, 2:24am

Thanks! Looks like this is a very recent bug, so thank you for flagging! I’ll look into this

muellerzr · January 6, 2021, 7:20pm

A fix has now been pushed to master. Thanks again

Haotong · January 8, 2021, 1:16am

Thanks for the fix @muellerzr! The “to” attribute is no longer appeared in the exported learner.

Sorry, I forget to point out that the “dset” attribute of the FillMissing transformation also store a copy of the dataset.

learn_loaded.dls.loaders[0].procs.fill_missing.dsets.items

Can you look into that also?

muellerzr · January 8, 2021, 1:22am

Ah, I see why that’s a thing. I have an interim PR that actually fixes that, that’s an inconsistency oversight on my part. Will lyk when that gets merged.

Interim this can fix it for folks:

@patch
def setups(self:FillMissing, dsets):
        missing = pd.isnull(dsets.conts).any()
        store_attr(but='dsets', na_dict={n:self.fill_strategy(dsets[n], self.fill_vals[n])
                            for n in missing[missing].keys()})
        self.fill_strategy = self.fill_strategy.__name__

(cc @Haotong)

Haotong · January 8, 2021, 2:39am

Thanks for the solution!