stas
(Stas Bekman)
September 12, 2018, 3:34am
1
First, thank you - it’s a very good beginner notebook to follow (especially after reading the blitz pytorch tutorial ).
I have a question about it. currently dev_nb/001a_nn_basics.ipynb says:
A Dataset can be anything that has a __len__
function (called by Python’s standard len
function) and a way of indexing into it.
However, pytorch docs say:
[...]All subclasses should override
``__len__``, that provides the size of the dataset, and ``__getitem__``,
supporting integer indexing in range from 0 to len(self) exclusive.
i.e. not just ‘a way’ but it needs to be __getitem__
? or is there another way?
radek
September 12, 2018, 9:45am
2
It needs to be the __getitem__
as this is what allows doing this: train_dataset[idx]
where idx belongs to [0, len(train_dataset)).
This is what the dataloaders expect. With custom dataloaders one could handle this in some other way, but this seems to be a nice convention.
1 Like
gokkulnath
(Gokkul Nath T S)
September 12, 2018, 10:00am
3
You are right. In 001b_fit.ipynb
class DatasetTfm(Dataset):
ds: Dataset
tfm: Callable = None
def __len__(self): return len(self.ds)
def __getitem__(self,idx):
x,y = self.ds[idx]
if self.tfm is not None: x = self.tfm(x)
return x,y
__getitem__
here tries to retrieve a sample from the dataset given its index.
P.S: I like the idea of having threads for each 00X series of notebooks. It will help to have more organized discussion
1 Like
stas
(Stas Bekman)
September 12, 2018, 5:56pm
4
Thank you for the confirmation, @radek and @gokkulnath . I updated the prose to indicate that.
1 Like