Model with multiple inputs - fastai v1

Seb · January 13, 2019, 2:34am

Hello everyone,

I’ve written a custom Dataset and a custom RNN model in Pytorch and am wondering how to use it with fastai.

Currently, my dataset returns two dictionaries: X and y

X has:

object id
meta data
a time series of categorical data
a time series of continuous data

y has the labels

My model’s forward function takes 3 variables as input: meta, cat, cont (as contained in X above)

I’m able to create Pytorch dataloaders and run a training loop.

I would like to be able to use the fastai library to train my model. I created a Databunch and tried the one_batch() function. Obviously, some error comes up because we are not expecting dicts but vectors at that point.

My question is: can fastai somehow manage models that have multiple inputs? What would be the best way for me to proceed?

Many thanks!

sgugger · January 13, 2019, 3:01pm

Of course the fastai library handles multiple inputs: tabular data comes as a list of tensors (for categorical and continuous variables).
You should create your custom ItemList (you can use TabularList as a model) so that everything works properly with the library, then use the data block API.

Seb · January 13, 2019, 4:46pm

Thanks for pointing me in the right direction! I’ll study the data block API now.

Seb · January 14, 2019, 9:34pm

I believe I might have figured out how to pass my pytorch dataset/dataloader into fastai.

I ended up not having to create a new ItemList

I modified my Dataset to return [X[‘cont’], X[‘cat’], X[‘meta’]],y[‘target’] instead of X,y (which are dictionaries). Then created Pytorch DataLoaders which I passed to a vanilla fastai DataBunch.

So, having a list of tensors for X seems to work. Or at least I managed to run learn.fit(1). I have to see if I run into any issues.

Ashka · April 24, 2019, 7:43pm

Do you have some sample code we can look into? Or pseudo code guiding the construction of your multiple inputs. Is it really just like the regular X input, but three times?

Seb · April 24, 2019, 8:08pm

My memory isn’t too fresh on this, but I’ll try to help.

The first thing is that your Dataset has to return something like [X1,X2,X3], y in getitem:

class MyDataset (Dataset):
def init(self, …):

def __len__(self):
    return self.dataset_size
             
def __getitem__(self, idx):

[…]

    return [torch.from_numpy(X1), torch.from_numpy(X2)] , y

The second thing is that your model can take those multiple inputs in the forward function:

class myRNN(nn.Module):
def init(self, etc):

    super().__init__()

    
def forward(self, x1, x2):

    
    [...]
    
    return output

Then create your train and validation datasets, and your network:

val_ds = MyDataset(val_data)
trn_ds = MyDataset(train_data)
net = myRNN()

You can now link all this to fastai by creating your Databunch and learner:

use_cuda = torch.cuda.is_available()
device = torch.device(“cuda:0” if use_cuda else “cpu”)

criterion = nn.L1Loss()

databunch = DataBunch.create(trn_ds,val_ds, device=device, bs =32)

learn = Learner(databunch,net,callback_fns=[ShowGraph], loss_func = criterion)

Some fastai features might be missing, but you will still be able to train your custom model.

Let me know if anything is unclear, or post your code if you get stuck!

Seb · April 24, 2019, 8:10pm

Compared to what I said in my older post, I seem to have stopped creating my own dataloader, and instead just passed datasets to the databunch which takes care of the dataloader creation.