Combining Tabular + Images in fastai2 (and should work with almost any other type)

the line b = next(iter(self.dls[key])) returns a TfmDL object for me which is not subscriptable and thus returns an error, although I pass Dataloader objects into the function. Im a little confused.

Need to know a bit more about what you’re doing to help. Are you passing in one DL? Or multiple DataLoaders to MixedDL

Sorry, so this is what im passing in:

def get_only_lateral_studies_data_loader(df_path):
    df = pd.read_csv(df_path)
    train_df = df.loc[(df['valid'] == False) & (df['Lateral'] != 'black.jpg')]
    valid_df = df.loc[(df['valid'] == True) & (df['Lateral'] != 'black.jpg')]
    train_df.reset_index(inplace=True)
    valid_df.reset_index(inplace=True)
    train_tl= TfmdLists(range(len(train_df)), StudyTransform(train_df))
    valid_tl= TfmdLists(range(len(valid_df)), StudyTransform(valid_df))
    dls = DataLoaders.from_dsets(train_tl, valid_tl, shuffle=True,
                             after_item=[ToTensor], 
                             after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats), *aug_transforms()])
    dls = dls.cuda()
    return dls

def get_only_frontal_studies_data_loader(df_path):
    df = pd.read_csv(df_path)
    df = df.loc[df['Lateral'] == 'black.jpg']
    df[target_label[0]] = df[target_label[0]].astype(bool)
    return ImageDataLoaders.from_df(df=df, path=path, fn_col='Frontal', shuffle_train=True, valid_col='valid', label_col=target_label, batch_tfms=aug_transforms())

dls_lateral = get_only_lateral_studies_data_loader(df_path)
dls_frontal = get_only_frontal_studies_data_loader(df_path)
dls_mixed = MixedDL(dls_lateral, dls_frontal)

And this is the Stack Trace:

<ipython-input-102-185f93ba75ea> in __init__(self, device, *dls)
     14         self.count = 0
     15         self.fake_l = _FakeLoader(self, False, 0, 0)
---> 16         self._get_idxs()
     17 
     18     def __len__(self): return len(self.dls[0])

<ipython-input-102-185f93ba75ea> in _get_idxs(self)
     36         for key, n_inp in dl_dict.items():
     37             b = next(iter(self.dls[key]))
---> 38             inps += L(b[:n_inp])
     39             outs += L(b[n_inp:])
     40         self.x_idxs = self._get_vals(inps)

TypeError: 'TfmdDL' object is not subscriptable

You need to pass in the individual train/valid DataLoaders separately. IE

mixed_train = MixedDL(lateral[0], frontal[0])
mixed_valid = MixedDL(lateral[1], frontal[1])

And then:

dls = DataLoaders(mixed_train, mixed_valid)

Let me know if that solves your issue @NimaC

Edit: ah, I did not mention this in the thread so far! Apologies! (BTW will be moving this over to walkwithfastai.com this week, so it’ll be a more flushed out tutorial :slight_smile: ) I’ll likely make a helper function to do this as well.

4 Likes

Hey Zach, did you end up moving this over? Could you point me to where in the repo/website?
Thanks!

You can check @morgan github repo here

1 Like

Hi @muellerzr thank you for your advices here, I am a new user fast ai library and what I did is build a combining model image and tabular data and already the model is trained. and now I want to predict single record from test data frame, I used this method to integrate input image and tabular data

integratedata,_=get_imagetabdatasets(test_image,tab_data)
and data format of

integratedata[0]

is ((Image (3, 128, 128), TabularLine [tensor([2]), tensor([-0.6136])]),
EmptyLabel 0)

and when I called

learn.predict(integratedata)

the error was: ‘ImageTabDataset’ object has no attribute ‘set_item’ , so what should I do to infer single input or single record from data frame. I hope clear on my question.

I used this notebook as a reference https://github.com/naity/image_tabular/blob/master/siim_isic_integrated_model.ipynb

1 Like

Hi all!

I am using the MixedDL to combine Tabular and NLP.

mixedDL1 = MixedDL(self.tab_dl[0], self.nlp_dl[0])
mixedDL2 = MixedDL(self.tab_dl[1], self.nlp_dl[1])

self.dls = DataLoaders(mixedDL1, mixedDL2)

I am using MixedDL class with one_batch function that defines @muellerzr:

def one_batch(self):
    "Grab one batch of data"
    with self.fake_l.no_multiproc(): res = first(self)
    if hasattr(self, 'it'): delattr(self, 'it')
    return res

But when I run this function I get the following error:

  File "/home/admin/PycharmProjects/tabular-nlp/tabular_nlp/concat_model/concat_pipeline.py", line 318, in create_databunch
    batch = mixedDL1.one_batch()
  File "/home/admin/PycharmProjects/tabular-nlp/tabular_nlp/concat_model/concat_pipeline.py", line 93, in one_batch
    res = first(self)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastcore/basics.py", line 547, in first
    return next(x, None)
  File "/home/admin/PycharmProjects/tabular-nlp/tabular_nlp/concat_model/concat_pipeline.py", line 77, in __iter__
    z = zip(*[_loaders[i.fake_l.num_workers == 0](i.fake_l) for i in self.dls])
  File "/home/admin/PycharmProjects/tabular-nlp/tabular_nlp/concat_model/concat_pipeline.py", line 77, in <listcomp>
    z = zip(*[_loaders[i.fake_l.num_workers == 0](i.fake_l) for i in self.dls])
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 552, in __init__
    self._dataset_fetcher = _DatasetKind.create_fetcher(
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 51, in create_fetcher
    return _utils.fetch._IterableDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 21, in __init__
    self.dataset_iter = iter(dataset)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastai/data/load.py", line 30, in __iter__
    def __iter__(self): return iter(self.d.create_batches(self.d.sample()))
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastai/data/load.py", line 103, in sample
    return (b for i,b in enumerate(self.__idxs) if i//(self.bs or 1)%self.num_workers==self.offs)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastcore/basics.py", line 388, in __getattr__
    if attr is not None: return getattr(attr,k)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastcore/basics.py", line 388, in __getattr__
    if attr is not None: return getattr(attr,k)
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastcore/transform.py", line 204, in __getattr__
    def __getattr__(self,k): return gather_attrs(self, k, 'fs')
  File "/home/admin/.virtualenvs/tabular-nlp/lib/python3.8/site-packages/fastcore/transform.py", line 162, in gather_attrs
    if k.startswith('_') or k==nm: raise AttributeError(k)
AttributeError: _DataLoader__idxs

Could someone tell me where this error comes from? Or how can i fix it?
Thanks in advance! :slight_smile:

1 Like

Can you share your full MixedDL code with me that you are using? :slight_smile:

Yes, this is the full MixedDL code that I am using:

class MixedDL:
    def __init__(self, tab_dl: TabDataLoader, nlp_dl: DataLoaders, device="cpu:0"):
        "Stores away `tab_dl` and `vis_dl`, and overrides `shuffle_fn`"
        self.device = device
        tab_dl.shuffle_fn = self.shuffle_fn
        nlp_dl.shuffle_fn = self.shuffle_fn
        self.dls = [tab_dl, nlp_dl]
        self.count = 0
        self.fake_l = _FakeLoader(self, False, 0, 0, 0)

    def __len__(self):
        return len(self.dls[0])

    def shuffle_fn(self, idxs):
        "Generates a new `rng` based upon which `DataLoader` is called"
        if self.count == 0:
            self.rng = self.dls[0].rng.sample(idxs, len(idxs))
            self.count += 1
            return self.rng
        else:
            self.count = 0
            return self.rng

    def to(self, device):
        self.device = device

    def __iter__(self):
        "Iterate over your `DataLoader`"
        z = zip(*[_loaders[i.fake_l.num_workers == 0](i.fake_l) for i in self.dls])
        for b in z:
            if self.device is not None:
                b = to_device(b, self.device)
            batch = []
            batch.extend(self.dls[0].after_batch(b[0])[:2])
            batch.append(self.dls[1].after_batch(b[1][0]))
            try:
                batch.append(b[1][1])
                yield tuple(batch)
            except:
                yield tuple(batch)

    def one_batch(self):
        "Grab a batch from the `DataLoader`"
        with self.fake_l.no_multiproc():
            res = first(self)
        if hasattr(self, "it"):
            delattr(self, "it")
        return res

    def show_batch(self):
        "Show a batch from multiple `DataLoaders`"
        for dl in self.dls:
            dl.show_batch()
            plt.show()

Using this I mixed tabular and nlp dataloaders:

mixedDL1 = MixedDL(self.tab_dl[0], self.nlp_dl[0])
mixedDL2 = MixedDL(self.tab_dl[1], self.nlp_dl[1])

Where self.tab_dl is a TabularDataLoaders, self.tab_dl[0] is a TabDataLoader, self.nlp_dl is a DataLoaders and self.nlp_dl[0] is a SortedDL.

Hi @muellerzr Zack, fascinating works! Do you have a colab notebook/githut repo to test out this hybrid model? It’s easier for me to follow if you have some sample dataset to play with.

Hello @Saioa, Glad to see your experiment on the tab+text hybrid! Do you have some update on your experiment? I have a application case want to test out this hybrid approach.

Sadly I do not, the data was proprietary:( but we can debug anything you’re working on together :hugs:

Hi @wjlgatech!
No, I didn’t make any more progress on the hybrid model. In the problem I was facing it was enough to add the loss of the NLP model as a column for the tabular model. And that’s how we solved the problem.

Still, at some point I want to go back to this code, so any progress you make on this way keep me up to date.

Hi @muellerzr, No worries. Totally understand. Thanks for offering insights and help! I will put some public datasets in a colab notebook and let’s work on it together from there.

1 Like

Hello @Saioa, it makes sense of your approach. I work around in a similar way: I train a fastai text classifer and extract its embedding as the input for the fastai tabular model. Computationally it is slow, many things still need to be optimized.

1 Like

Guess image+tab is not what you need but:

The Gradient Blending approach would also be valuable here I think :smiley:

1 Like

Hi @muellerzr @Joan @Saioa,

I put up an github repo to work on the hybrid model which combining fastai tabular + fastai text.

You can check out the notebook.

Here I still need (helps) to address a bug in my notebook, which is about putting hook on the right place of tabular model and text model. Any advice would be appreciated!

Ha! I just found a great tutorial: Model hooks | fastai

2 Likes

Hi all.

If you want to play with a small dataset (images and tabular data) to detect COVID-19 from chest x-rays, you can use the COVIDcxr dataset.

It consists of 960 CXR images (i.e., 320 COVID-19, 320 normal, and 320 pneumonia) and the associated tabular data (i.e., gender, sex, and view) for each patient.

You can generate COVIDcxr dataset. Then, build a Mixed DataLoader for it.

1 Like