General course chat

How do we fetch the list of filenames from our dataset with the new updates to the library?

I’m running on planet-amazon. 10 days ago I was able to just call:

idx2class = {v:k for k,v in learn.data.train_ds.ds.class2idx.items()}

to convert class indices back to class-names, and:

fnames = [f.name.split('.')[0] for f in learn.data.test_ds.ds.x]

to get the filenames. But the .data.<xyz>_ds no longer has the .ds attribute I was using.

I checked the changelog and searched around the forums, but anything I found was from about a month ago. I’ll edit this post with the answer if I find it.


edit:

so looks like you can call:

learn.data.train_ds.x.items

to get the list of filepaths (and also for .valid_ds and .test_ds).

Is this the ‘right’ way to do it? And is this guaranteed to match up with predictions on the validation and test sets?


edit2:

Think I found how to get your class-to-index mapping:

learn.data.train_ds.y.c2i

It’s gotten more intuitive: (“where can I find filenames?” → take a look at where the data comes from: .<blah>_ds.x.<blah>; “where can I find how classes are one-hot encoded?” → look where the labels are stored: <blah>_ds.y.<blah>

2 Likes

Is there anyway I can read zip file directly through fast.ai libs and read all image files within zip files?

Thanks
Amit

I think there’s no fastai method to extract zip files right now, you have to decompress it on your own using an other library first

1 Like

Thanks…Have now used python zip module to unzip

unfreeze() function generally unfreezes all layer. I could not locate any fast.ai function which allows me to unfreeze last x layers and not all the layers…Can anyone please help me on this?

Thanks
Amit

look at the imdb notebook. I think you want to use

learn.freeze_to(-x) where X would be you last few layers or your choosing

3 Likes

A note: It looks like freeze_to freezes by layer group (learn.layer_groups). For resnet34 and resnet50 there are 3 layer groups.

So for finer control I guess you can just take a look at the function and apply

if not self.train_bn or not isinstance(l, bn_types): requires_grad(l, False)

on the layers (l's) you want to freeze.

I always insert in my notebooks fastai.__version__

1 Like

Why my validation loss is much higher than training loss even in first epoch? I though this is our goal but what I’m doing wrong when this is happening first epoch. Second epoch valid loss decrease a little bit faster than train loss.

Difficult to answer without more info but it could be that by chance the pretrained weights performe better on the training set than the validation one. Shouldn’t happen with a large enough and well chosen validation set though. It also could be that the model had already seen some of the training data, because you already trained on it or some of the data the pretrained weights was trained on is also in your training set.

To be clear, having a validation loss much higher than the training loss is not the overall goal, because that means you’re overfitting (so not generalising very well).

1 Like

[quote=“Lankinen, post:245, topic:24987, full:true”]
Why my validation loss is much higher than training loss even in first epoch? I though this is our goal…[/quote]

No, our goal is to build an accurate model that generalises well. In lesson 2 Jeremy explained that if you start out with a very high validation loss, you may want to take a look at your learning rate, as it may be too high.

1 Like

image
Does it matter that my valid loss is at the beggining very high compered to train loss if in the end I got good results? @PierreO @AndrewK

Well if that’s the first time you train the model on this data I don’t really get why you would start by overfitting. Did you use the learning rate finder and the one cycle policy ?
Depending on your data 60% seems like a high error rate.

Yeah, I use learning rate finder before this and one cycle policy. 60% is actually pretty good at this problem but if I’m doing something wrong then it might get even better results. Could it somehow be plausible that valid_loss should be train_loss and vice versa?

I don’t think fastai would have such a big bug :wink:
I don’t really know what’s going on apart from what I already suggested and without a closer look at your notebook / your data

I may contact you later today but for now I try to solve this problem on my own by running couple of times this notebook. Thanks for helping me.

1 Like

I have done the modelling and validation also. Now there are hundreds of image with their image id and i need to classify them as test data. I could not locate any function from fast.ai to send as image data bunch and get classification result with object id name…May be i am missing something…Can anyone please help me on this?

Thanks
Amit

Just wanted to share that I’ve found that watching the videos multiple times has been super helpful.

I watched all 5 videos once. (Super overwhelmed, but continued nevertheless)

When I got back to watching from video 1 again, its almost like the brain had enough time to let everything watched simmer, and it was MUCH easier to register and understand.

So in case anyone is finding things overwhelming, I’d recommend spaced repetition.

Your brain is a neural network as well :slight_smile: , so more repitions, more layers, more epochs, better learning.

8 Likes

I am trying to train a deep learning model. I am getting following error while running learn.fit_one_cycle(4):
RuntimeError: Traceback (most recent call last):
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/torch_core.py”, line 92, in data_collate
return torch.utils.data.dataloader.default_collate(to_data(batch))
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 232, in
return [default_collate(samples) for samples in transposed]
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 320 and 180 in dimension 2 at /opt/conda/conda-bld/pytorch-nightly_1542185950098/work/aten/src/TH/generic/THTensorMoreMath.cpp:1319

I have used labels from csv file, and images are in one folder. Image name has one specific label. It returns the images with ‘data.show_batch(rows=3, figsize=(7,6))’. While training I am getting error

I got the answer. Images werr of different sizes. With size = 340, it worked.