Planet Classification Challenge

learner.bn_freeze(False)?
It works like unfreezing of the layers?(Batch Norms)

1 Like

Made a rookie error in my previous submission. I forgot to train on complete data to make my submission. When I made this change, my score improved from 0.92990 (133) -> 0.93095 (105). Not bad for a single resnet34 model which we learnt in the class.

Just mentioning the mistake here so that it might help someone :slight_smile:

4 Likes

Personally, I don’t generally use a random seed, since I quite like to see what amount of natural variation there is. But I believe this should do it:

np.random.seed(args.manualSeed)
torch.manual_seed(args.manualSeed)
torch.cuda.manual_seed_all(args.manualSeed)
8 Likes

Ah, that worked! :slight_smile:
Btw, just setting torch.manual_seed(args.manualSeed) also seems to be working.

4 Likes

Hi, did you guys rename the additional test set files? I think I am getting errors because of this.

I moved the images in test-jpg-additional/ into test-jpg/ and everything works smoothly from then onwards. In total, ensure you have 61191 images in test-jpg/ folder.

3 Likes

@nafiz, no, I didn’t rename, nor did I combine the two test sets (I’m running on Crestle with simlinks). I predict the two sets separately, and then combine them to submit, but the first time I did this something weird happened with the second test set and all my predictions were wrong. So you do need to be careful.

I’m thinking I just should have combined the two test sets as @binga did, the time I saved by not doing I have then wasted x100 in wrangling the two separate test sets!

After moving the additional test files to the test-jpg folder, I was getting this error

----> 1 tta= learn.TTA(is_test=True)
RuntimeError: received 0 items of ancdata

I saw that there is an issue already here regarding this -

From there I tried to use the hack of using -

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048, rlimit[1]))

But still it is showing the following errors-

> ----> 1 tta= learn.TTA(is_test=True)
>       2 classes = np.array(data.classes, dtype=str)
>       3 res = [" ".join(classes[np.where(pp > 0.2)]) for pp in tta[0]]
>       4 test_fnames = [os.path.basename(f).split(".")[0] for f in data.test_ds.fnames]
>       5 test_df = pd.DataFrame(res, index=test_fnames, columns=['tags'])
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/learner.py in TTA(self, n_aug, is_test)
>     167         preds1,targs = predict_with_targs(self.model, dl1)
>     168         preds1 = [preds1]*math.ceil(n_aug/4)
> --> 169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]
>     170         return np.stack(preds1+preds2).mean(0), targs
>     171 
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/learner.py in <listcomp>(.0)
>     167         preds1,targs = predict_with_targs(self.model, dl1)
>     168         preds1 = [preds1]*math.ceil(n_aug/4)
> --> 169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]
>     170         return np.stack(preds1+preds2).mean(0), targs
>     171 
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
>     115     if hasattr(m, 'reset'): m.reset()
>     116     res = []
> --> 117     for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
>     118     preda,targa = zip(*res)
>     119     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py in __next__(self)
>     219         if self.i>=len(self.dl): raise StopIteration
>     220         self.i+=1
> --> 221         return next(self.it)
>     222 
>     223     @property
> 
> ~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
>     199                 self.reorder_dict[idx] = batch
>     200                 continue
> --> 201             return self._process_next_batch(batch)
>     202 
>     203     next = __next__  # Python 2 compatibility
> 
> ~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
>     219         self._put_indices()
>     220         if isinstance(batch, ExceptionWrapper):
> --> 221             raise batch.exc_type(batch.exc_msg)
>     222         return batch
>     223 
> 
>  AttributeError: Traceback (most recent call last):
>   File "/home/nafizh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
>     samples = collate_fn([dataset[i] for i in batch_indices])
>   File "/home/nafizh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in <listcomp>
>     samples = collate_fn([dataset[i] for i in batch_indices])
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py", line 94, in __getitem__
>     return self.get(self.transform, x, y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py", line 99, in get
>     return (x,y) if tfm is None else tfm(x,y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 466, in __call__
>     def __call__(self, im, y=None): return compose(im, y, self.tfms)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 447, in compose
>     im, y =fn(im, y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 231, in __call__
>     x,y = ((self.transform(x),y) if self.tfm_y==TfmType.NO
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 239, in transform
>     x = self.do_transform(x)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 403, in do_transform
>     if self.rp: x = rotate_cv(x, self.rdeg, mode=self.mode)
> AttributeError: 'RandomRotateXY' object has no attribute 'mode'

I have the latest code from the fast-ai repo. Any suggestions on this?

learn.lr_find() failing with following erroe in lesson2 notebook:


Do you have the latest code from fastai repo? I would suggest doing a git pull, and see if it goes away. Jeremy seemed to mention that he got rid of the Opencv library which is in your error.

I’m on the latest code. BTW, Jeremy brought back opencv :slight_smile:

I’ll try to restart AWS and see if that helps. I had previously tried restarting kernel.

1 Like

Restartring aws did help.

3 Likes

That is great. Glad it worked.

We resized images in lesson 2 notebook while size was 64:
data = data.resize(int(sz*1.3), 'tmp')

But for size 128, 258 we are providing the new set of data and not resizing for them. Any insight on this?

I was just about to ask the same question.

We do …

img_sz = 64
data = get_data(arch, img_sz, val_idxs)
data = data.resize(int(img_sz * 1.3), 'tmp') # this creates /tmp/83

… and then resize to 128 and then to 256.

BUT, if you look in the file system, you’ll only see a tmp/83 folder with your resized images from the above line of code. It seems that when we resize to 128 we are resizing the previously downsized images we saved as 64x64 images … and also when we resize to 256, that we are again resizing from the 64x64 images.

Is that right?

If it is, for some reason, it feels wrong to be building bigger images from previously downsized images instead of using the original sizes to do the 128 and 256 sized images.

Actually looking again, that’s not what we’re doing - we’re creating the dataset again from scratch, not using the resized images. So I think it’s fine.

1 Like

Ok … that makes sense looking at the code again.

I take it then that the call to resize to 128 and 256 acts against the original sized images in this case.

If on the otherhand we didn’t make another call to get_data(), we would have upscaled the 64x64 images to 128 and 256.

1 Like

Exactly right.

Should I spin p3.xlarge?

lesson2, last step is taking ~2 hours! :frowning:

What is the total number of items in your test folder?
Test count mismatch:

I’ve 40669 images:
image

While I try to submit Kaggle says:
image