Get a DataLoaders from training and validation DataLoader

Hi,
so far in the book I foun examples where a dls (DataLoaders) is built from a DataBlock(…).dataloaders(path_to_the_images_folder).

Is it possible to build a dls given two DataLoader? One for the training and one for the validation.
For example

train_set = dset[0:50000]
valid_set = dset[50000:]
dl = DataLoader(train_set, batch_size=bs, shuffle=True)
valid_dl = DataLoader(valid_set, batch_size=bs)

dls = ??? how ???
learn = cnn_learner(dls, resnet34, metrics=error_rate)

The dl dataloader and the valid_dl dataloader already contains x and y, i can iterate over them like that

for x,y in dl:
    print("input", x)
    print("target", y)

I tried to do that, but it doesn’t work :man_shrugging:

dls = DataLoaders(dl, valid_dl)
learn = cnn_learner(dls, resnet34, metrics=error_rate)

----> 2 learn = cnn_learner(dls, resnet34, metrics=error_rate)

1 frames
/usr/local/lib/python3.7/dist-packages/fastai/vision/learner.py in _add_norm(dls, meta, pretrained)
    154     if not pretrained: return
    155     after_batch = dls.after_batch
--> 156     if first(o for o in after_batch.fs if isinstance(o,Normalize)): return
    157     stats = meta.get('stats')
    158     if stats is None: return

AttributeError: 'function' object has no attribute 'fs'
1 Like

Well, actually you can just use the index splitter to achieve this https://docs.fast.ai/data.transforms.html#IndexSplitter

Thanks, but i didn’t get it, can you give me an example?

A dataloaders object automatically creates a training dataloader and a validation dataloader for you. It all depends on how you feed the dataset to the dataloaders-object.

For instance, if your data it separated into to different folders for training and validation, you can use the grandparentsplitter like so:

fnames **=** [path**/**'train/3/9932.png', path**/**'valid/7/7189.png',
path**/**'valid/7/7320.png', path**/**'train/7/9833.png',
path**/**'train/3/7666.png', path**/**'valid/3/925.png',
path**/**'train/7/724.png', path**/**'valid/3/93055.png']
splitter **=** GrandparentSplitter()
test_eq(splitter(fnames),[[0,3,4,6],[1,2,5,7]])

But if your data is in the same folder and you just want to split it at a specific index like in your example, you can use the indexsplitter like so:

items **=** list(range(10))
splitter **=** IndexSplitter([3,7,9])
test_eq(splitter(items),[[0,1,2,4,5,6,8],[3,7,9]])

so in your specific case (I haven’t tested this) it would be something like:

splitter **=** IndexSplitter([50000:])
test_eq(splitter(items),[[0:50000],[50000:]])

which means that when you create your dataloaders object, you declare it like so:

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock), get_items=get_image_files, get_y=get_y, splitter=IndexSplitter([50000:])

dls = dblock.dataloaders(path)

Hope that helps.

But i don’t have any images on the filesystem, as I wrote in the first post I have two dataloader already, one for the training and one for the validation.
Each dataloder is created from a list of tuple, each tuple contains the tensor of the image and the target:
[(image tensor, target), (image tensor, target), … etc …]

Given those two DataLoader, is possibile to create a DataLoaders class to pass to the cnn_learner?

Then I misunderstood you, I’m sorry, I don’t have the answer to that question

My Understanding is that you correctly set up PyTorch Dataloader in fastai dataloaders with dls = DataLoaders(trainloader, testloader) as you did.

The error seem to come from fastai trying to normalize your dataloaders (and failing), can you try setting normalize=False in your cnn_learner and see if that helps?

Sorry for the late reply.
I’ve tried that

# train_set and valid_set are like [(tensor size 784, label), ...]
dl = DataLoader(train_set, batch_size=bs, shuffle=True)
valid_dl = DataLoader(valid_set, batch_size=bs)

dls = DataLoaders(dl, valid_dl)

learn = cnn_learner(dls, resnet34, metrics=error_rate, normalize=False, n_out=10, loss_func=CrossEntropyLossFlat())
learn.fine_tune(1)

but it doesn’t work, it says

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 2-dimensional input of size [500, 784] instead

500 is the batch size, 784 is the total pixels of the image, 28x28.