How to use DataLoader?

minhncsocial · September 2, 2018, 9:45pm

Hi everyone,

I want to use fastai’s DataLoader to load data from object.
I run it successfully when use torch.utils.data.DataLoader, but fail when use fastai’s DataLoader.
The code i run is: https://github.com/fxia22/pointnet.pytorch/blob/master/train_segmentation.py

Could anyone tell me how to use fastai’s DataLoader to load data? Then load some batch sample to show it out?

Thank you very much.

Hadus · September 3, 2018, 9:41am

For the FastAI DataLoader you should be able to call:

next(iter(fastai_dataloader))

That should give you a batch. Does that work?

It is meant to be used as:

for batch_index, batch_data in enumerate(fastai_dataloader):

In the GitHub code, there is an extra argument for enumerate that is set to 0 but that is redundant because that is the default value.

minhncsocial · September 4, 2018, 2:08am

Thank @Hadus
I already tried it.
with torch.utils.data.DataLoader, both next(iter(…) and for batch in enumerate(…) are able to run correctly.
but with fastai’s DataLoader, both ways return error:

Traceback (most recent call last):
  File "/home/minhnc-lab/PROGRAMS/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-5e4e60e93023>", line 1, in <module>
    batch = next(iter(fastai_dataloader))
  File "/home/minhnc-lab/WORKSPACES/Python/GRASP_DETECTION/PointNet/pointnet1_pytorch/fastai/dataloader.py", line 82, in __iter__
    for batch in map(self.get_batch, iter(self.batch_sampler)):
  File "/home/minhnc-lab/WORKSPACES/Python/GRASP_DETECTION/PointNet/pointnet1_pytorch/fastai/dataloader.py", line 75, in get_batch
    res = self.np_collate([self.dataset[i] for i in indices])
  File "/home/minhnc-lab/WORKSPACES/Python/GRASP_DETECTION/PointNet/pointnet1_pytorch/fastai/dataloader.py", line 71, in np_collate
    return [self.np_collate(samples) for samples in zip(*batch)]
  File "/home/minhnc-lab/WORKSPACES/Python/GRASP_DETECTION/PointNet/pointnet1_pytorch/fastai/dataloader.py", line 71, in <listcomp>
    return [self.np_collate(samples) for samples in zip(*batch)]
  File "/home/minhnc-lab/WORKSPACES/Python/GRASP_DETECTION/PointNet/pointnet1_pytorch/fastai/dataloader.py", line 72, in np_collate
    raise TypeError(("batch must contain numbers, dicts or lists; found {}".format(type(b))))
TypeError: batch must contain numbers, dicts or lists; found <class 'torch.Tensor'>

loftiskg · September 4, 2018, 3:57am

I think that the fastai Dataloader automatically converts the contents of the batch into a tensor. Try modifying your Dataset class to return a numpy array as opposed to tensor.

minhncsocial · September 4, 2018, 8:28am

Thank @Hadus and @loftiskg,
I modified the dataset and it’s able to load data now.
In my own dataset loader, it return tensor. pytorch DataLoader accept all kind of fed data, but fastai not. When I change returned data to numpy array, fastai DataLoader is able to run.

Now I got problem that: HOW TO TRAIN MODEL?
I have my own model definition.
I can use DataLoader to get sample data.
But i don’t know how to:

Create learner from defined model
feed dataloader to learner

Does anyone have sample code of doing it?

Thank you very much.

Hadus · September 4, 2018, 11:07am

Did you not watch the whole of course 1?? You should!

One of the Jupyter notebooks from course 1 definitely has some code that you are looking for.

minhncsocial · September 10, 2018, 9:09am

Hi @Hadus
I checked all the samples of deep learning 1 lessons, but I couldn’t find out the sample that uses DataLoader.
Most of them use ImageClassifierData.from…
Could you tell me which lesson it is?

Hadus · September 11, 2018, 2:48pm

I was talking about not using the learner class at all. It looks like I was wrong and there isn’t a simple pytorch train function implementation there. Here is a basic example (UNTESTED CODE):

names = ["error"]
layout = "{!s:10} " * len(names)

epochs = 50
criterion = F.binary_cross_entropy
dataloader = 
net = 
optimizer = optim.Adam(net.parameters(), lr=0.001, betas=(0.9, 0.999)


def print_stats(epoch, values, decimals=6):
    layout = "{!s:^10}" + " {!s:10}" * len(values)
    values = [epoch] + list(np.round(values, decimals))
    print(layout.format(*values))

for epoch in tnrange(epochs, desc="Epoch"):
    t = tqdm(iter(dataloader), leave=False, total=len(dataloader))

    for i, batch in enumerate(t):

        xs = Variable(batch[0]).cuda()
        ys = Variable(batch[1]).cuda()

        optimizer.zero_grad()

        y_hats = net(xs)

        err = criterion(ys, y_hats)

        err.backward() 
        optimizer.step()

        t.set_postfix(err=to_np(err.mean()))

    if epoch == 0:
        print(f"\n{layout.format(*names)}")

    print_stats(epoch, [to_np(err.mean())])

I hope you can tailor it to be useful. Good luck

minhncsocial · September 11, 2018, 7:49pm

Hi @Hadus, Thank you for your kindness.
I did the similar way to you and I was able to train.
Now, I want to use learner to train. But I’m not sure what is the proper way.
Could you tell me how to feed model and dataloader to the learner?

Hadus · September 12, 2018, 1:43pm

The code from

uses the model PointNetDenseCls that returns a two values in its forward function. This could cause problems as the learner expects one value to be returned. The easy fix would be to edit the PointNetDenseCls so forward only returns x.

Try something like this: (ALSO NOT TESTED CODE)

# model data
path = "..."
trn_dl = DataLoader(...)
val_dl = DataLoader(...)
test_dl = None

model_data = ModelData(path, trn_dl, val_dl, test_dl=test_dl)

# model
model = PointNetDenseCls(...)

# optimizer
optimizer = optim.Adam(...)

# criterion
criterion = F.nll_loss

# learner
learner = Learner(model_data, BasicModel(to_gpu(model)), opt_fn=optimizer, crit=criterion)

where there is “…” just fill in with what you want.

Try this with and without editing the forward in the PointNetDenseCls class.

Also if this doesn’t throw an error, the learner.fit might.

pradla · January 25, 2019, 11:48pm

see if this helps:

ds = TextDataset(encoded,labels)
dl = DataLoader(ds, 64, transpose=True, num_workers=1, pad_idx=1)
md = ModelData('/data/esdata2/dask/parquet_sentiment/', None, dl) 
learn = RNN_Learner(md, TextModel(to_gpu(mode))) 
learn.load_encoder('/data/esdata2/dask/companypanel/aclImdb/models/lm1_enc')
learn.load('/data/esdata2/dask/companypanel/aclImdb/models/clas_2_new4')

minhncsocial · February 12, 2019, 11:39am

Oh. Thank @Hadus and @pradla.
I already finished the program.
Thank you for your kindness. I am really grateful your helps.
I uploaded the code PointNet on fastai to github too. Hope it’s useful to someone.

jgtjerry · February 21, 2019, 9:52pm

Hello,

I am having a hard time figuring out the use of Dataloaders in Fastai.

I am doing the Language Modelling.

d1 = DataLoader(df_trn, batch_size=4, shuffle=True, num_workers=2, pin_memory=True)
d2 = DataLoader(df_val, batch_size=32, shuffle=False, num_workers=2, pin_memory=True)

data_lm = TextLMDataBunch.from_df('./', train_df=d1, valid_df=d2, bs=2)


learner = language_model_learner(dl, pretrained_model=URLs.WT103_1, drop_mult=0.5)

learner.lr_find()

I am getting the error:
Memory Error

Why I am I getting the memory issue when I am using dataloder?

Thanks
Jerry

andreasl · May 2, 2019, 12:21pm

This usually happens when you run out of memory.

Try to reduce your batch_size (that way less data will be loaded at the same time, so that it can fit in the available memory).

I also think num_workers=2 will spawn 2 processes, each with their own copy of the data in memory, doubling the memory usage. Setting num_workers=1 might help.