Mixed Text + Tabular with Datasets + Dataloaders

Hello all!

I am tryning to create a model mixing text and image inputs. I like the datasets + dataloaders approach because of the flexibility it provides for the inputs design. However, I am a bit confused about how before_batch, after_item and after_batch really works.

My data is as follows:

items is a list of strings that has a structure like this:

"<label>;<text>;<image filename>;<is_valid>"

Dataset and Dataloaders:

def get_item(item, index):
	return item.split(';')[index]

splits=FuncSplitter(lambda o: o.split(";")[3]=='valid')(items)

dsrc = Datasets(items, 
                splits=splits,
                tfms=[[partial(get_item, 2), PILImage.create],             #image axis
                      [partial(get_item, 1), partial(custom_tokenizer)],	#text axis	
                      [partial(get_item, 0), Categorize()]	                #label
                     ])


dls = dsets.dataloaders(bs=8, source=items, #num_workers=8, 
                        after_item = [Resize(528, method='squish'), ToTensor()],
			before_batch=[partial(pad_input,pad_fields=1)],
                        after_batch= [IntToFloatTensor(), 
                                      *aug_transforms(size=528,
                                                      do_flip=True,
                                                      max_rotate=15,#15.0,
                                                      max_zoom=1.1,#1.1,
                                                      max_lighting=0.3,#0.3,
                                                      max_warp=0.0,#0.2,
                                                      p_affine=1.0,
                                                      p_lighting=1.0),
                                      NormalizeEf.from_advprop(2.0, 1.0)
                                      ],
                        shuffle_train=True, path=path
                       )

How before_batch, after_item and after_batch would “know” wich axis to be applied to?

@muellerzr, any thoughts?

2 Likes

Transforms work on fastai2's own item system. What I mean by this is when augmentation is applied, it’s not just a Tensor. Instead it’s a TensorImage. And there are numerous other TensorX in the library. The transforms then run various different versions of a transform via TypeDispatch if it applies to that particular item. Some quick examples off the top of my head is we have TensorImage, TensorPoint, TensorBBox, TensorText, and LMTensorText. I’d go look in the vision augment file for some examples on how TypeDispatch is being used on transforms :slight_smile:

1 Like

Thanks!