001a_nn_basics.ipynb discussion

stas · September 12, 2018, 3:34am

First, thank you - it’s a very good beginner notebook to follow (especially after reading the blitz pytorch tutorial).

I have a question about it. currently dev_nb/001a_nn_basics.ipynb says:

A Dataset can be anything that has a __len__ function (called by Python’s standard len function) and a way of indexing into it.

However, pytorch docs say:

[...]All subclasses should override
``__len__``, that provides the size of the dataset, and ``__getitem__``,
supporting integer indexing in range from 0 to len(self) exclusive.

i.e. not just ‘a way’ but it needs to be __getitem__? or is there another way?

radek · September 12, 2018, 9:45am

It needs to be the __getitem__ as this is what allows doing this: train_dataset[idx] where idx belongs to [0, len(train_dataset)).

This is what the dataloaders expect. With custom dataloaders one could handle this in some other way, but this seems to be a nice convention.

gokkulnath · September 12, 2018, 10:00am

You are right. In 001b_fit.ipynb

class DatasetTfm(Dataset):
ds: Dataset
tfm: Callable = None

def __len__(self): return len(self.ds)

def __getitem__(self,idx):
    x,y = self.ds[idx]
    if self.tfm is not None: x = self.tfm(x)
    return x,y

__getitem__ here tries to retrieve a sample from the dataset given its index.

P.S: I like the idea of having threads for each 00X series of notebooks. It will help to have more organized discussion

stas · September 12, 2018, 5:56pm

Thank you for the confirmation, @radek and @gokkulnath. I updated the prose to indicate that.