A Walk with fastai2 - fastinference (mini-series)

Here’s the link for today’s stream: https://youtu.be/myKgF-d9-N4

Considering the theme last week, before we cover dendrogram plots and feature importance we’ll go into the MixedDL that I made for combining datasets of different types

See you all in an hour :slight_smile:

(also I finally figured out what happened with lesson 1, I’ll re-upload it soon)


The fix for permutation importance is in there now as well

Hi Zach, love the thinking behind this!

I was just looking at the text section; perhaps you’re still working on this. I was wondering why you don’t call tokenize_df when creating the test_dl.

I remember talking to Sylvain about it about it here https://github.com/fastai/fastai2/issues/302#issuecomment-614173101

1 Like

I wasn’t made aware about this issue actually :slight_smile:

I’ll make that adjustment internally probably, see here:

That being said though, I don’t see why test_dl couldn’t simply do this itself… (I’ll ask Sylvain about this)

@HenryDashwood discussed with Sylvain, test_dl will be a one-stop shop for both in a moment

Oh cool. Do you mean tokenize_df will be called internally by test test_dl, as opposed to the user wrapping the latter round the former as is done now?

Yes, it’ll be a user-set option to tokenize the incoming dataframe or not

Ah that’s very nice

Hey @muellerzr, i keep running into this issue with onnx conversion for multi-label text models. I don’t think the error is on the part of fastai2, but i can’t seem to index into the dataloader. Any thoughts?

learner = load_learner("text_classifier_version_3")
dl= learner.dls.test_dl(df['Message'][:100])
x=learner
orig_bs = x.dls[0].bs
x.dls[0].bs=1
dummy_inp = next(iter(x.dls[0]))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-e93addb1c4d7> in <module>
      2 orig_bs = x.dls[0].bs
      3 x.dls[0].bs=1
----> 4 dummy_inp = next(iter(x.dls[0]))
      5 # x.dls[0].bs = orig_bs
      6 # x.dls[0].bs=1

/media/training/fastai2/fastai2/fastai2/data/load.py in __iter__(self)
     96         self.randomize()
     97         self.before_iter()
---> 98         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     99             if self.device is not None: b = to_device(b, self.device)
    100             yield self.after_batch(b)

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    854             else:
    855                 del self._task_info[idx]
--> 856                 return self._process_data(data)
    857 
    858     def _try_put_index(self):

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
    879         self._try_put_index()
    880         if isinstance(data, ExceptionWrapper):
--> 881             data.reraise()
    882         return data
    883 

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/_utils.py in reraise(self)
    392             # (https://bugs.python.org/issue2651), so we work around it.
    393             msg = KeyErrorMessage(msg)
--> 394         raise self.exc_type(msg)

AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 137, in _worker_loop
    fetcher = _DatasetKind.create_fetcher(dataset_kind, dataset, auto_collation, collate_fn, drop_last)
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 41, in create_fetcher
    return _utils.fetch._IterableDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 21, in __init__
    self.dataset_iter = iter(dataset)
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 27, in __iter__
    def __iter__(self): return iter(self.d.create_batches(self.d.sample()))
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 92, in sample
    idxs = self.get_idxs()
  File "/media/training/fastai2/fastai2/fastai2/text/data.py", line 161, in get_idxs
    idxs = super().get_idxs()
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 88, in get_idxs
    if self.shuffle: idxs = self.shuffle_fn(idxs)
  File "/media/training/fastai2/fastai2/fastai2/text/data.py", line 167, in shuffle_fn
    idx_max = np.where(idxs==self.idx_max)[0][0]
  File "/media/training/fastai2/fastcore/fastcore/foundation.py", line 234, in __getattr__
    if attr is not None: return getattr(attr,k)
  File "/media/training/fastai2/fastai2/fastai2/data/core.py", line 292, in __getattr__
    def __getattr__(self,k): return gather_attrs(self, k, 'tls')
  File "/media/training/fastai2/fastcore/fastcore/transform.py", line 155, in gather_attrs
    if not res: raise AttributeError(k)
AttributeError: idx_max

Your DataLoaders’ data is never saved when you do load_learner/learn.export, so there’s nothing to iterate over, instead the blueprint of how to make it. If you want to do it this way, instead do next(iter(dl))

Yup. That did it. Sorry, i just couldn’t grasp that dl was an iterable. Seems pretty simple looking back at the answer thought :slight_smile:

1 Like

For those wondering why no announcement, that’s most of the big impacts that have been made on this project, I want to iron out a few more bits before the next video :slight_smile:

For those wanting to explore however, what wasn’t covered is the fact Intrinsic Attention is available for text classifier learners via learn.intrensic_attention(str)

2 Likes

Thanks for this!

I’ve been using the editable install of fastai2, so I’m ahead of the fastai2 version that gets installed when I run:

pip install fastinference

Can I safely copy/paste a file or two and get the increased speed, or do you recommend reverting to the pip version of fastai2 to avoid issues?

It should work on the most recent versions, even the dev install so that shouldn’t be an issue.

Only time that could ever happen is if a humongous change occurred in the library, and I keep a close eye out :wink:

1 Like

Neat package, @muellerzr! Thanks.
I found a subtle bug when running inference on massive test sets, I’ve raised an issue on git here.

1 Like

With the release of my two minilibraries (fastinference_onnx and fastinference_pytorch) I’m working to remove all fastai code possible for easier deployment on smaller-based systems. Once the library is fully fleshed out (including vision and NLP as currently it just supports tabular) we will do a few walkthrough videos potentially of the code and how I went about it. In the meantime here is the documentation, and if you want to use it on tabular models only, the newest version of fastinference includes a to_fastinference function:


4 Likes

Hi Zachary, thank you for your awesome work on improving inference speed for fastai models. I have been able to successfully export a fastai2 vision model to onnx and use for inference on CPU. But to use it I have to use fastai2 dataloader. I tried custom pytorch dataloader but the performance dropped. I also tried various techniques mentioned in this post: Speeding Up fastai2 Inference - And A Few Things Learned but not sure which once is best for using model on CPU. Please guide me towards the best way to use fastai2 onnx model for inference on CPU.

I’m working on a PyTorch only (and not even PyTorch as well) version right now, so please stay tuned. For now, when going through the item transforms make sure every parameter is exactly how PIL would expect it, and that your normalizing the same.

1 Like

Will wait for the pytorch only version. Thank you for all the inputs.

Hi Zach,

Would I be able to use the SHAP for text learner?

I was gonna try, and I’m embarassed to report, I can’t even get off the blocks in colab.

!pip install fastinference[all]
from fastinference import *

no module named fastinference...

Shap is not available for text through fastinference. Hmmm. I’ll look into that after tommorow.

I can’t recreate this. Installing the [all] works just fine:

!pip install fastinference[all]
import fastinference