How to return inputs and targets from "get_items"?

I’m trying to use get_items to return both my inputs and targets … but for some reason, it thinks that what I’m returning is just my inputs (and so when I call dsets.train[0] I see everything duplicated).

dblock = DataBlock(get_items=get_items,
                   splitter=RandomSplitter())
dsets = dblock.datasets(test_df)

My get_items is currently spitting out a tuple of two things, my inputs, and my targets (both are of type list) as so …

def get_items(src_df):
    inputs,targets = [],[]
    ... 
    return (inputs, targets)

… but when I look at my training dataset, I see this:

dsets.train[0]
# comes back like: ((input1,  target1), (input1,  target1))

How do I need to return things from get_items so that my datablock knows that the first thing is my inputs and the second thing is my targets?

You can leave get_items as it is and do:

get_x=ItemGetter(0), get_y=ItemGetter(1)

in your datablock

1 Like

Yup that worked. Thanks!

I still a bit confounded as to why everything works without this in this example:

pascal_source = untar_data(URLs.PASCAL_2007)
df = pd.read_csv(pascal_source/"train.csv")

def _pascal_items(x): 
  return (
    f'{pascal_source}/train/'+x.fname, x.labels.str.split())

valid_idx = df[df['is_valid']].index.values

pascal = DataBlock.from_columns(get_items=_pascal_items,
                   splitter=IndexSplitter(valid_idx))

dsets.train[0]
# returns -> ('/root/.fastai/data/pascal_2007/train/000017.jpg', ['person', 'horse'])

Any ideas why a get_x and get_y didn’t need to be applied in that scenario?

-wg

I guess it’s because DataBlock.from_columns already expects this kind of input

We don’t need neither get_x nor get_x because the getters is None in the example above, and that triggers the automatic creation of get_x (getters[0]), and get_y (getters[1]) in the from_columns() method like this (Notice that since blocks is None, range is set to 2 in the for loop):

if getters is None: getters = L(ItemGetter(i) for i in range(2 if blocks is None else len(L(blocks))))

Here is the definition of from_columns() method found in block.py file

class DataBlock():
    "Generic container to quickly build `Datasets` and `DataLoaders`"
...
@classmethod
    def from_columns(cls, blocks=None, getters=None, get_items=None, **kwargs):
        if getters is None: getters = L(ItemGetter(i) for i in range(2 if blocks is None else len(L(blocks))))
        get_items = _zip if get_items is None else compose(get_items, _zip)
        return cls(blocks=blocks, getters=getters, get_items=get_items, **kwargs)
3 Likes