Lost in the DataBlock API woods again

Hi, FastAI friends,

Lost in the Data Block API again, I turn myself to you for help. I’m creating a model that generates controls for facial landmark estimation. In pratice, I have a neutral face composed of 68 points, without deformations nor rotation. Upon observation of another set of 68 points (ie, another face, but this time twisted and rotated), my model returns a set of control that I use to deform the neutral one. I then compare the difference between the two faces and use this as a loss. At least that’s what I wanna do. Indeed, I am lost again in the data block, and I pledged myself to write my own pipeline this time, to have a better understanding of the library (even though I know V2 is coming).

Anyway, here’s what I’m doing:
In a top down approach, I first created the databunch class:


class FaceDataBunch(DataBunch): 

    @classmethod
    def create(cls, train_ds, valid_ds, test_ds=None, path:PathOrStr='.', no_check:bool=False, bs=64, val_bs:int=None, 
           num_workers:int=0, device:torch.device=None, collate_fn:Callable=data_collate, 
           dl_tfms:Optional[Collection[Callable]]=None, bptt:int=70,
           preloader_cls=None, shuffle_dl=False, transpose_range=(0,12), **kwargs) -> DataBunch:

        datasets = cls._init_ds(train_ds, valid_ds, test_ds)
        preloader_cls = MusicPreloader if preloader_cls is None else preloader_cls
        val_bs = ifnone(val_bs, bs)
        datasets = [preloader_cls(ds, shuffle=(i==0), bs=(bs if i==0 else val_bs), bptt=bptt, transpose_range=transpose_range, **kwargs) 
                    for i,ds in enumerate(datasets)]
        val_bs = bs
        dl_tfms = [partially_apply_vocab(tfm, train_ds.vocab) for tfm in listify(dl_tfms)]
        dls = [DataLoader(d, b, shuffle=shuffle_dl) for d,b in zip(datasets, (bs,val_bs,val_bs,val_bs)) if d is not None]
        return cls(*dls, path=path, device=device, dl_tfms=dl_tfms, collate_fn=collate_fn, no_check=no_check)

    @classmethod    
    def from_folder(cls, path:PathOrStr, extensions='.npy', **kwargs):
        files = get_files(path, extensions=extensions, recurse=True);
        return cls.from_files(files, path, **kwargs)

    @classmethod
    def from_files(cls, files, path, processors=None, split_pct=0.1, 
                   vocab=None, list_cls=None, **kwargs):

        list_cls = FaceDataList
        src = (list_cls(items = files, path = path, processor = None)
               .split_by_rand_pct(split_pct, seed=6))
     
        src = src.HOW_TO_LABEL() ???


        return src.databunch(**kwargs)

So, to function, this needs the FaceDataList:

class FaceDataList(ItemList): 

    _bunch, _processor, _label_cls = FaceDataBunch, FaceDataProcessor, FloatList

    def get(self, i): 
        filename = super().get(i)
        obj = self.open(filename)
        return obj 

    def open(self, fn): 
        return np.load(fn)

Finally, I created a FacePointItem (which, I discovered latter is almost a clone of a FloatItem)


class FaceData(ItemBase): 

    def __init__(self, points): 

        self.points = points 
        self.data = torch.tensor(points.reshape(-1)).float()

    def __str__(self): 

        return '{}'.format(np.copy(self.points).reshape(-1,3))

    def to_one(self): 
        return np.copy(self.points).reshape(-1,3)

So far, this seems to return effectively the src variable, in the databunch creation. However, I have no idea how to label this. I mean, the label are the initially loaded points, pretty much like an encoder, but how can I give this to the databunch ? I wanted to create an auto_label() function in the FaceDataList class, but I can’t figure out the inputs nor the outputs needed.

Could someone put me in the right direction ? Thanks a lot !
Thanks a lot !