A Walk with fastai2 - fastinference (mini-series)

Hi everyone! So I’m going to go ahead and make a little miniseries walking through fastinference because I’ll specifically be talking about the how and the why I made it this way, and we may have a few other lessons mixed in too (such as the mixing tab + text DL’s)

There is not a specific time/date set up for these, I will announce roughly 48 hrs in advance, but expect weekends

Lessons:

Introduction to fastinference and get_preds:

MixedDL, feature importance, and dendrograms:

13 Likes

Hi @muellerzr,
I saw the video posted on YouTube but it was only at 480p.

By any chance do you have a HD version?

Thanks

1 Like

Ah shoot, no I do not. I’ll take a look at it on my end and if it seems too bad, I may do something (reshoot, etc). I got a new laptop so my WWF2 settings all went away.

Edit: okay definitely looks too bad. Keep an eye here for an update

So, we’ll set the time/date for the next part to be Sunday at 2pm Central Time. I’ll be re-recording the first video today, and this lesson will cover ClassConfusion

Here’s the link for today’s stream: https://youtu.be/myKgF-d9-N4

Considering the theme last week, before we cover dendrogram plots and feature importance we’ll go into the MixedDL that I made for combining datasets of different types

See you all in an hour :slight_smile:

(also I finally figured out what happened with lesson 1, I’ll re-upload it soon)


The fix for permutation importance is in there now as well

Hi Zach, love the thinking behind this!

I was just looking at the text section; perhaps you’re still working on this. I was wondering why you don’t call tokenize_df when creating the test_dl.

I remember talking to Sylvain about it about it here https://github.com/fastai/fastai2/issues/302#issuecomment-614173101

1 Like

I wasn’t made aware about this issue actually :slight_smile:

I’ll make that adjustment internally probably, see here:

That being said though, I don’t see why test_dl couldn’t simply do this itself… (I’ll ask Sylvain about this)

@HenryDashwood discussed with Sylvain, test_dl will be a one-stop shop for both in a moment

Oh cool. Do you mean tokenize_df will be called internally by test test_dl, as opposed to the user wrapping the latter round the former as is done now?

Yes, it’ll be a user-set option to tokenize the incoming dataframe or not

Ah that’s very nice

Hey @muellerzr, i keep running into this issue with onnx conversion for multi-label text models. I don’t think the error is on the part of fastai2, but i can’t seem to index into the dataloader. Any thoughts?

learner = load_learner("text_classifier_version_3")
dl= learner.dls.test_dl(df['Message'][:100])
x=learner
orig_bs = x.dls[0].bs
x.dls[0].bs=1
dummy_inp = next(iter(x.dls[0]))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-e93addb1c4d7> in <module>
      2 orig_bs = x.dls[0].bs
      3 x.dls[0].bs=1
----> 4 dummy_inp = next(iter(x.dls[0]))
      5 # x.dls[0].bs = orig_bs
      6 # x.dls[0].bs=1

/media/training/fastai2/fastai2/fastai2/data/load.py in __iter__(self)
     96         self.randomize()
     97         self.before_iter()
---> 98         for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
     99             if self.device is not None: b = to_device(b, self.device)
    100             yield self.after_batch(b)

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    854             else:
    855                 del self._task_info[idx]
--> 856                 return self._process_data(data)
    857 
    858     def _try_put_index(self):

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
    879         self._try_put_index()
    880         if isinstance(data, ExceptionWrapper):
--> 881             data.reraise()
    882         return data
    883 

~/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/_utils.py in reraise(self)
    392             # (https://bugs.python.org/issue2651), so we work around it.
    393             msg = KeyErrorMessage(msg)
--> 394         raise self.exc_type(msg)

AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 137, in _worker_loop
    fetcher = _DatasetKind.create_fetcher(dataset_kind, dataset, auto_collation, collate_fn, drop_last)
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 41, in create_fetcher
    return _utils.fetch._IterableDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)
  File "/home/mlbetty1/anaconda3/envs/fastai2_lm/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 21, in __init__
    self.dataset_iter = iter(dataset)
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 27, in __iter__
    def __iter__(self): return iter(self.d.create_batches(self.d.sample()))
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 92, in sample
    idxs = self.get_idxs()
  File "/media/training/fastai2/fastai2/fastai2/text/data.py", line 161, in get_idxs
    idxs = super().get_idxs()
  File "/media/training/fastai2/fastai2/fastai2/data/load.py", line 88, in get_idxs
    if self.shuffle: idxs = self.shuffle_fn(idxs)
  File "/media/training/fastai2/fastai2/fastai2/text/data.py", line 167, in shuffle_fn
    idx_max = np.where(idxs==self.idx_max)[0][0]
  File "/media/training/fastai2/fastcore/fastcore/foundation.py", line 234, in __getattr__
    if attr is not None: return getattr(attr,k)
  File "/media/training/fastai2/fastai2/fastai2/data/core.py", line 292, in __getattr__
    def __getattr__(self,k): return gather_attrs(self, k, 'tls')
  File "/media/training/fastai2/fastcore/fastcore/transform.py", line 155, in gather_attrs
    if not res: raise AttributeError(k)
AttributeError: idx_max

Your DataLoaders’ data is never saved when you do load_learner/learn.export, so there’s nothing to iterate over, instead the blueprint of how to make it. If you want to do it this way, instead do next(iter(dl))

Yup. That did it. Sorry, i just couldn’t grasp that dl was an iterable. Seems pretty simple looking back at the answer thought :slight_smile:

1 Like

For those wondering why no announcement, that’s most of the big impacts that have been made on this project, I want to iron out a few more bits before the next video :slight_smile:

For those wanting to explore however, what wasn’t covered is the fact Intrinsic Attention is available for text classifier learners via learn.intrensic_attention(str)

2 Likes

Thanks for this!

I’ve been using the editable install of fastai2, so I’m ahead of the fastai2 version that gets installed when I run:

pip install fastinference

Can I safely copy/paste a file or two and get the increased speed, or do you recommend reverting to the pip version of fastai2 to avoid issues?

It should work on the most recent versions, even the dev install so that shouldn’t be an issue.

Only time that could ever happen is if a humongous change occurred in the library, and I keep a close eye out :wink:

1 Like

Neat package, @muellerzr! Thanks.
I found a subtle bug when running inference on massive test sets, I’ve raised an issue on git here.

1 Like