A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

They are train and valid. There is no dsrc.test anymore as we can have unlimited datasets on any one object. As such, if you do dsrc[2] (index into it) this will be one of our test datasets. (if we had more than one, this could be dsrc[4]

This is why we can do the following later:
a,b = learn.get_preds(ds_idx=2), this grabs the third dataset in our dataloader (which is our test)

Thanks! In playing with it, I saw that dsrc[0] and by extension dsrc[1], dsrc[2] are all tuples with first item of tuple being the image and the second the label.

Ah I may be slightly wrong. Give me one moment :slight_smile:

Also - way to access test image is NOT then through show_at(dsrc.test, 3) as with train and valid but through dsrc[2][0].show() - at least that is what I got to work.

Is there a better way to access and show test images - like for train and valid images which use show_at(dsrc.train, idx)

Got it. You access them via .subset. IE We can do:

show_at(dsrc.subset(2), 1)

With subset 2 being that test dataloader. This help @Srinivas?

Subset 0 is train, 1 valid, etc…

1 Like

Yes. Makes more sense. For I was actually able to do dsrc[1000] which is also a tuple of length 2 which says that dsrc ie the dataset has a len of 12954 which is probably a addition of train and test 9025 + 3929

Yes, and the length of 2 that it contains an x and a y

yup - that clears things up

quick question on this subset .
x,y = pets.subset(1)[0] is the same as pets.valid[0] so subset(1) is the validation set similarly subset(0) is the training set . but when you have more than one training set (in k-fold cross val), you cannot use the shortcuts any more you would have to always index into subset correct ? @Srinivas @muellerzr

Yes. That is correct.

1 Like

OK - Moving on to dataloaders - dls
I see that dls.n_subsets is 3. And when I do len(dls.subset(0)) I get 7220 which is len of training set and strangely its type is Dataset (!!)
Now - when I do dls.subset(1) I get a list index out of range error - EH?? When there are 3 subsets to dls? What am I missing?

Depending on how you set up your subsets. We were able to cause we used seperate index’s beforehand (splits) and then built the dsrc etc and kept the test. You could instead make a bunch of different dataloaders each with your different train split. There a a few ways you could do it :slight_smile:

Index into your dataloaders (try dls[0])

dls[0] seems to be a transform
<fastai2.data.core.TfmdDL at 0x7f1a12646588>

Also - dls.subset(0) worked !! and its length is that of the training set.

That’s not a transform, that’s a Transformed DataLoader :wink:

Yes. I did not read that carefully. However still able to access dls.subset(0) but dls.subset(1) and dls.subset(2) give errors.

That I’m unsure of. I’ve always index’d them. Does that exist on the git version too?

do you mean on the dev version. If so, I will need to learn how to load that and try. Think @barnacl did that for something else. @barnacl could you share how you switched to dev version of fastai2 pls? Thx

Check the notebook with dblock summary. That’s the dev install

for anyone running 04_Segmentation on windows, I had some cuda errors and what fixed it for me was changing this line

dls = camvid.dataloaders(path/'images', bs=8)

to

dls = camvid.dataloaders(path/'images', bs=8, num_workers=0)

explanation is here https://pytorch.org/docs/stable/notes/windows.html#cuda-ipc-operations

1 Like