Lesson 3 In-Class Discussion ✅

eunosM3 · November 12, 2018, 3:21am

@joshfp is probably right - your .json file isn’t in your active directory. Importantly, the “root” shown by jupyter is jupyter’s root, not the root of the operating system. So, the fully-qualified path surely includes a couple of directory levels above it before reaching jupyter’s root directory.

It might help if you examine your active directory with the following (and might be useful in general for others who stumble across this post):

Use !pwd to display the active directory;
Use !ls -a to list all of the files and directories in the active directory, including the hidden files or folders;
Use something like find . -name "*.json" or find / -name "*.json" to search the active directory and its subdirectories or the root directory and subs, respectively. The second one may take a long time and throw a slew of ‘permission denied’ responses.
Use !echo $HOME to display the home path of the OS, which is usually something like /home/<username>;

Essentially, joshfp is saying that your active directory is something other than your home directory (where your home directory equals /home/<username> aka ~/). As a result you need to use the fully-qualified path to the file, kaggle.json file to move it to the hidden folder ~/.kaggle.

Or, you could change to your home directory using os.chdir('/home/<username>') or maybe even os.chdir('~/'). !cd ~/ or !cd /home/<username>/ might work, but sometimes cd flakes out in a jupyter notebook, in my experience.

Blanche · November 12, 2018, 12:38pm

I’m trying to save model as pth from the forst phrase and then load it when we’re suppoused to train on bigger images. I have loaded pth by doing

model = torch.load("/home/jupyter/tutorials/fastai/course-v3/nbs/dl1/model.pth")

but I have no idea how to plug it into unet for further training.

karan · November 12, 2018, 12:43pm

Whats is the folder format to load the dataset for classifier in imdb notebook.
while loading getting this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-53-97dd9c4a5ffe> in <module>
      3             .label_from_folder(classes=['hotel','train'])
      4              #label them all with their folder, only keep 'neg' and 'pos'
----> 5             .split_by_folder(valid='test')
      6              #split by folder between train and validation set
      7             .datasets()

~/anaconda2/envs/hindinlu/lib/python3.6/site-packages/fastai/data_block.py in datasets(self, dataset_cls, **kwargs)
    234         train = dataset_cls(*self.train.items.T, **kwargs)
    235         dss = [train]
--> 236         dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
    237         cls = getattr(train, '__splits_class__', self._pipe)
    238         return cls(self.path, *dss)

~/anaconda2/envs/hindinlu/lib/python3.6/site-packages/fastai/data_block.py in <listcomp>(.0)
    234         train = dataset_cls(*self.train.items.T, **kwargs)
    235         dss = [train]
--> 236         dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
    237         cls = getattr(train, '__splits_class__', self._pipe)
    238         return cls(self.path, *dss)

~/anaconda2/envs/hindinlu/lib/python3.6/site-packages/fastai/basic_data.py in new(self, *args, **kwargs)
     40     def new(self, *args, **kwargs):
     41         "Create a new dataset using `self` as a template"
---> 42         return self.__class__(*args, **kwargs)
     43 
     44     def _get_x(self,i):   return self.x[i]

TypeError: __init__() missing 1 required positional argument: 'fns'

tenzin · November 12, 2018, 2:08pm

I am getting division by zero Error with following learning rate setting, any solution to fix this ?

lr = 1e-03
learn.fit_one_cycle(5, slice(1e-04, lr/5))

Error messages:

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-29-2392d8fd8181> in <module>
----> 1 learn.fit_one_cycle(5, slice(1e-04, lr/5))

~/fastai/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     16                   wd:float=None, callbacks:Optional[CallbackList]=None, **kwargs)->None:
     17     "Fit a model following the 1cycle policy."
---> 18     max_lr = learn.lr_range(max_lr)
     19     callbacks = ifnone(callbacks, [])
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,

~/fastai/fastai/basic_train.py in lr_range(self, lr)
    148         "Build differential learning rates."
    149         if not isinstance(lr,slice): return lr
--> 150         if lr.start: res = even_mults(lr.start, lr.stop, len(self.layer_groups))
    151         else: res = [lr.stop/3]*(len(self.layer_groups)-1) + [lr.stop]
    152         return np.array(res)

~/fastai/fastai/core.py in even_mults(start, stop, n)
    102     "Build evenly stepped schedule from `start` to `stop` in `n` steps."
    103     mult = stop/start
--> 104     step = mult**(1/(n-1))
    105     return np.array([start*(step**i) for i in range(n)])
    106 

ZeroDivisionError: division by zero

How n = 1 in this case ? Any help is much appreciated.

joshfp · November 12, 2018, 2:22pm

Apparently, n==1 because self.layer_groups==1. If your model has only one layer_group, instead of passing an slice, try passing a single number (float) as learning rate.

mocha · November 12, 2018, 3:07pm

Any one got this issue when running imdb? (GCP)

data_lm = (TextFileList.from_folder(path)
…
data_lm.save(‘tmp_lm’)

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 1179: ordinal not in range(128)
Thanks

apalepu23 · November 12, 2018, 4:38pm

Hey @joshfp, was wondering if you might be able to provide some intuition behind the reasoning the squeeze is needed? In the camvid notebook, we have the following code:

name2id = {v:k for k,v in enumerate(codes)}
void_code = name2id['Void']

def acc_camvid(input, target):
    target = target.squeeze(1)
    mask = target != void_code
    return (input.argmax(dim=1)[mask]==target[mask]).float().mean()

My understanding is a torch_tensor.squeeze() removes all dimensions that are equal to 1, and torch_tensor.squeeze(dim) removes that dimension if it’s of size 1? It makes sense that the pred and targ dimensions should be the same, but any tips on how to think about what the dimensions of pred and targ are during the learn process in order to know how to squeeze them to the same shape? Thanks!!

ricknta · November 12, 2018, 6:02pm

@gjohn Sorry I didn’t see your post at first. Interesting question. Maybe we can get @lesscomfortable to weigh in on this… Also note that we moved the discussion on this to a new topic:

Kaspar · November 12, 2018, 8:06pm

I was wondering myself and anlysed it :

def accuracy(input, target):
    #The input.argmax(dim=1) selects the winner class pr pixel, thereby reducing 
    #the input shape from (bs, classes, width, height) => (bs, width, height)
    #We, therefore, have to reshape the target tensor from (bs, 1, width, height) to (bs, width, height)
    sz     = target.size()
    target = target.reshape( (sz[0],sz[2],sz[3]) )
    return (input.argmax(dim=1).flatten()==target.flatten()).float().mean()

I replaced the squeeze by reshape because it turns out that sueeze does not work if the last batch only contains one tensor, In that cases it squees out 2 dimension and not 1 ie (1,1,width,height) becomes width,height .

karan · November 12, 2018, 8:12pm

Getting errors while loading imdb data
While creating data_clas for imdb classifier getting this error:

data_clas = (TextFileList.from_folder(path)
             #grap all the text files in path
            .label_from_folder(classes=['neg','pos'])
             #label them all with their folder, only keep 'neg' and 'pos'
            .split_by_folder(valid='test')
             #split by folder between train and validation set
            .datasets()
             #use `TextDataset`, the flag `is_fnames=True` indicates to read the content of the files passed
            .tokenize()
             #tokenize with defaults from fastai
            .numericalize(vocab = data_lm.vocab)
             #numericalize with the same vocabulary as our pretrained model
            .databunch(TextClasDataBunch, bs=50))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-90-5e9fe410bc17> in <module>
      4             .label_from_folder(classes=['neg','pos'])
      5              #label them all with their folder, only keep 'neg' and 'pos'
----> 6             .split_by_folder(valid='test')
      7              #split by folder between train and validation set
      8             .datasets()

~/anaconda2/envs/hindinlu/lib/python3.6/site-packages/fastai/data_block.py in datasets(self, dataset_cls, **kwargs)
    231     def datasets(self, dataset_cls:type=None, **kwargs)->'SplitDatasets':
    232         "Create datasets from the underlying data using `dataset_cls` and passing along the `kwargs`."
--> 233         if dataset_cls is None: dataset_cls = self.dataset_cls()
    234         train = dataset_cls(*self.train.items.T, **kwargs)
    235         dss = [train]

~/anaconda2/envs/hindinlu/lib/python3.6/site-packages/fastai/text/data.py in dataset_cls(self)
     34 
     35     def dataset_cls(self):
---> 36         return FilesTextDataset if isinstance(self.train.items[0][0],Path) else TextDataset
     37 
     38     def add_test_folder(self, test_folder:str='test', label:Any=None):

IndexError: index 0 is out of bounds for axis 0 with size 0

can any one suggest what file format or folder format should I follow?

@sgugger

jyoti3 · November 12, 2018, 8:24pm

I am trying to work on TGS salt classification problem which has 2 classes in its mask. While working on it i have noticed following things which i don’t quite understand:

lr_find(learn) gives me very different vales every time i run it
My validation error is very very high train loss= .28 but validation 4075 or so. What can cause such behavior?

I am using dice as my accuracy metric with 2 classes.

apalepu23 · November 12, 2018, 8:27pm

makes sense Kaspar! I guess part of the problem was I was assuming the target should just be a tensor of (width, height), but if you do mask.data.shape you get a tensor(1, width, height), so it makes sense you need to squeeze that down. thanks

sam2 · November 12, 2018, 8:42pm

This may have been discussed before…so please refresh my memory or set my understanding right.

Is there merit in converting input images to gray-scale before processing further? I thing the transforms could potentially do that but Jeremy may have shared his experience in the past that conversion to gray-scale does not help a lot.

Thanks a lot

aki58 · November 12, 2018, 9:06pm

I have trained language model by following the imdb notebook for Hindi text classification and haven’t changed any parameter. Now while loading the encoder for classification getting this error.

learn.load_encoder('fine_tuned_enc1')
learn.freeze()


RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
	size mismatch for encoder.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([226, 400]).
	size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([60002, 400]) from checkpoint, the shape in current model is torch.Size([226, 400]).

not getting what’s wrong .

praman · November 12, 2018, 10:02pm

When I run the following code, I get an error.

Code:

data = (src.datasets().transform(get_transforms(), size=128).databunch().normalize(imagenet_stats))

Error:

/opt/anaconda3/lib/python3.6/site-packages/fastai/data_block.py in datasets(self, dataset_cls, **kwargs)
    232         "Create datasets from the underlying data using `dataset_cls` and passing along the `kwargs`."
    233         if dataset_cls is None: dataset_cls = self.dataset_cls()
--> 234         train = dataset_cls(*self.train.items.T, **kwargs)
    235         dss = [train]
    236         dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]

TypeError: __init__() missing 2 required positional arguments: 'x' and 'y'

Which arguments am I missing?

joshfp · November 13, 2018, 12:35am

I am not smart enough to keep track of the tensors’ dimensions and ranks on the fly . So, when something is not working, I start by checking tensors’ shapes.
In this particular case, it seems that the open_mask function converters the mask in a regular 3-channel image (rank-3 tensor), but since the mask is 1-channel, the batch shape is [bs, 1, h, w]. If you are using the standard fastai pipeline, you don’t have to worry about it, since its loss functions flatten the preds and target tensors. However, when defining customs metrics, a good practice should be checking tensors’ shapes.

Descobar14 · November 13, 2018, 1:54am

learningrate

Hello,
I was wondering what this type of graph means since it seems to have a pretty flat slot in the beginning which seems to be a problem for finding a learning rate accurately. I’m using a dataset comparing images of different types of architecture towers:

According to some of the images there is maybe a classification issue due to the similarity of towers between like an office tower or a residential. Perhaps I have to do better seperation between the images. I was just wondering if there’s a better way to approach this since I’m getting this for training:

Total time: 11:51
epoch train_loss valid_loss error_rate
1 1.866207 1.394691 0.533537 (02:55)
2 1.614861 1.335335 0.524390 (02:59)
3 1.439242 1.272921 0.500000 (02:58)
4 1.305421 1.266789 0.496951 (02:57)

any help is appreciated. Thank you.

wdhorton · November 13, 2018, 3:21am

Jeremy mentioned this type of graph in the lecture, and you can see an example in the planets notebook. He suggested you find the point where it starts rising and then divide that learning rate by 10, so in your case I’d try 1e-4.

oguiza · November 13, 2018, 8:32am

error rate is going down, and val loss is still above train loss, which mean that you are still underfitting. There’s no reason to stop at 4 epochs. You should increase the number of epochs until you find that error rate plateaus or starts to go up, and train loss is lower than val loss.

mocha · November 13, 2018, 11:45am

Run

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

Before bring up the jupyter notebook