Call to single_ds causes IndexError in learn.predict after importing model

MadeUpMasters · July 10, 2019, 6:07pm

I’m having an issue with using learn.predict on models imported using load_learner in an audio module for fastai that Im contributing to here(github). A call by learn.predict to single_ds is causing an IndexError because the train and valid labellists are empty.

single_ds calls single_dl which pulls data from the validation set. (this line is from DataBunch init) self.single_dl = _create_dl(DataLoader(valid_dl.dataset, batch_size=1, num_workers=0))

But since we have an empty validation set, when it tries to get 1 item from there it gets an IndexError.

Problem:

372         ds = self.data.single_ds
    373         pred = ds.y.analyze_pred(raw_pred, **kwargs)
--> 374         x = ds.x.reconstruct(grab_idx(x, 0))
    375         y = ds.y.reconstruct(pred, x) if has_arg(ds.y.reconstruct, 'x') else ds.y.reconstruct(pred)
    376         return (x, y, pred, raw_pred) if return_x else (y, pred, raw_pred)

Full Stack Trace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-84-ae0665b3b9f3> in <module>
----> 1 audio_predict_all(new_learn, test)

~/rob/fastai_audio/audio/learner.py in audio_predict_all(learn, al)
     17     al = al.split_none().label_empty()
     18     data = [AudioList.open(al, ai[0].path).spectro for ai in al.train]
---> 19     preds = [learn.predict(spectro) for spectro in progress_bar(data)]
     20     return [o for o in zip(*preds)]

~/rob/fastai_audio/audio/learner.py in <listcomp>(.0)
     17     al = al.split_none().label_empty()
     18     data = [AudioList.open(al, ai[0].path).spectro for ai in al.train]
---> 19     preds = [learn.predict(spectro) for spectro in progress_bar(data)]
     20     return [o for o in zip(*preds)]

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in predict(self, item, return_x, batch_first, with_dropout, **kwargs)
    372         ds = self.data.single_ds
    373         pred = ds.y.analyze_pred(raw_pred, **kwargs)
--> 374         x = ds.x.reconstruct(grab_idx(x, 0))
    375         y = ds.y.reconstruct(pred, x) if has_arg(ds.y.reconstruct, 'x') else ds.y.reconstruct(pred)
    376         return (x, y, pred, raw_pred) if return_x else (y, pred, raw_pred)

/opt/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in reconstruct(self, t, x)
     97     def reconstruct(self, t:Tensor, x:Tensor=None):
     98         "Reconstruct one of the underlying item for its data `t`."
---> 99         return self[0].reconstruct(t,x) if has_arg(self[0].reconstruct, 'x') else self[0].reconstruct(t)
    100 
    101     def new(self, items:Iterator, processor:PreProcessors=None, **kwargs)->'ItemList':

/opt/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in __getitem__(self, idxs)
    116         "returns a single item based if `idxs` is an integer or a new `ItemList` object if `idxs` is a range."
    117         idxs = try_int(idxs)
--> 118         if isinstance(idxs, Integral): return self.get(idxs)
    119         else: return self.new(self.items[idxs], inner_df=index_row(self.inner_df, idxs))
    120 

~/rob/fastai_audio/audio/data.py in get(self, i)
    312 
    313     def get(self, i):
--> 314         item = self.items[i]
    315         if isinstance(item, AudioItem): return item
    316         if isinstance(item, (str, PosixPath, Path)):

IndexError: index 0 is out of bounds for axis 0 with size 0

sgugger · July 10, 2019, 8:47pm

That’s because the ItemList you’re using doesn’t have a reconstruct method, so it tries to grab the one on the first element of the dataset. You should implement this method (just return x if you don’t have any postprocessing) to avoid the issue.

MadeUpMasters · July 15, 2019, 2:13pm

That got things working, but I think I’m going about it the wrong way, and I would like things to be done in the fastai compatible way as much as possible.

Our audio preprocessing (resampling, silence removal…etc) is done when a LabelList is created. At inference time we have individual AudioItems, so to make sure the same preprocessing is followed, we’ve been using an audio_predict method that takes an AudioItem or AudioList and calls .split_none().label_empty() to cause the items to be preprocessed before passing the items to learn.predict(). This seems to work but I feel like I should be using reconstruct to initiate the preprocessing, but I’ve read the custom ItemList guide and I’m still not sure how to do it.

What do I need to do so our users can just call learn.predict() and get_preds() directly instead of our custom methods? Here is what our code looks like now, it feels really bad/inefficient. Thank you.

def audio_predict(learn, item:AudioItem):
    '''Applies preprocessing to an AudioItem before predicting its class'''
    al = AudioList([item], path=item.path, config=learn.data.x.config).split_none().label_empty()
    ai = AudioList.open(al, item.path)
    return learn.predict(ai)                                              

def audio_predict_all(learn, al:AudioList):
    '''Applies preprocessing to an AudioList then predicts on all items'''
    al = al.split_none().label_empty()
    audioItems = [AudioList.open(al, ai[0].path) for ai in al.train]
    preds = [learn.predict(ai) for ai in progress_bar(audioItems)]
    return [o for o in zip(*preds)]

sgugger · July 15, 2019, 4:48pm

The reconstruct method is for anything you want run as post-processing over the predictions. It doesn’t look like you need it in your two functions.
For the preprocessing, if you add an AudioList to build your DataBunch, you should have those preprocessing steps directly applied when you call predict. The only thing is that you need to feed the items exactly the same way as when you build your AudioList.