Planet Classification Challenge

jamesrequa · November 18, 2017, 9:42pm

Great Job!! Does it also show your private leaderboard score? Ideally that is the score you’ll want to be looking at since the final score for Kaggle comps is based on the private leaderboard.

binga · November 18, 2017, 9:47pm

Yes. You can see the private scores too. Check it in My Submissions Tab.

KevinB · November 18, 2017, 9:51pm

I have a .93000 private score.

memetzgz · November 18, 2017, 10:21pm

Thanks for the heads up on finding the private score under My Submissions. Mine is horrible (.57), so I think I did something wrong when I generated predictions on the second half of the test images. Off to check!

jamesrequa · November 18, 2017, 10:33pm

Yep I think its prob an issue with the test id names and/or sorting issues. Believe it or not even a kaggle master that had gold medal ranking on public LB dropped 900 ranking spots on private lb because of this and it was easy to go undetected because the files corresponding to the public lb sorting was correct!

memetzgz · November 18, 2017, 10:45pm

Ha – I guess I’m in good company LOL

I think it had something to do with the way I constructed my submission file – the predictions were okay in the pandas data frame.

Another valuable lesson learned

KevinB · November 18, 2017, 11:59pm

Ouch, I don’t understand how that happened though? So the public LB has a defined number of records and the private one uses the rest. How does one do well and one do horrendously? Aren’t they using the same underlying model?

KevinB · November 19, 2017, 12:14am

Do you have any rule of thumbs on when the larger number models (34 vs 50 vs 101), and the architectures (resnet vs resnext vs vgg) are better or do you just try all of the models available and see which one does the best.

memetzgz · November 19, 2017, 12:20am

Okay, the problem WAS how I constructed my submission file – .9283 for the private LB now

Back to the salt mines now to refine . . .

jeremy · November 19, 2017, 2:44am

Sadly no I don’t. I’d like to find some!

binga · November 19, 2017, 4:59am

Is anybody able to reproduce their experiment results? I’m using torch.cuda.manual_seed(42) to ensure I get the same results across runs however I’m unable to do so.

Edit: I tried torch.cuda.seed() too. Not reproducible!

ecdrid · November 19, 2017, 11:45am

learner.bn_freeze(False)?
It works like unfreezing of the layers?(Batch Norms)

binga · November 19, 2017, 7:21pm

Made a rookie error in my previous submission. I forgot to train on complete data to make my submission. When I made this change, my score improved from 0.92990 (133) → 0.93095 (105). Not bad for a single resnet34 model which we learnt in the class.

Just mentioning the mistake here so that it might help someone

jeremy · November 19, 2017, 7:50pm

Personally, I don’t generally use a random seed, since I quite like to see what amount of natural variation there is. But I believe this should do it:

np.random.seed(args.manualSeed)
torch.manual_seed(args.manualSeed)
torch.cuda.manual_seed_all(args.manualSeed)

binga · November 19, 2017, 8:08pm

Ah, that worked!
Btw, just setting torch.manual_seed(args.manualSeed) also seems to be working.

nafizh · November 19, 2017, 8:38pm

Hi, did you guys rename the additional test set files? I think I am getting errors because of this.

binga · November 19, 2017, 8:40pm

I moved the images in test-jpg-additional/ into test-jpg/ and everything works smoothly from then onwards. In total, ensure you have 61191 images in test-jpg/ folder.

memetzgz · November 19, 2017, 9:15pm

@nafiz, no, I didn’t rename, nor did I combine the two test sets (I’m running on Crestle with simlinks). I predict the two sets separately, and then combine them to submit, but the first time I did this something weird happened with the second test set and all my predictions were wrong. So you do need to be careful.

I’m thinking I just should have combined the two test sets as @binga did, the time I saved by not doing I have then wasted x100 in wrangling the two separate test sets!

nafizh · November 19, 2017, 10:37pm

After moving the additional test files to the test-jpg folder, I was getting this error

----> 1 tta= learn.TTA(is_test=True)
RuntimeError: received 0 items of ancdata

I saw that there is an issue already here regarding this -

github.com/fastai/fastai

RuntimeError: received 0 items of ancdata

opened 06:46AM - 12 Nov 17 UTC

closed 11:09PM - 29 Nov 17 UTC

kevinbird15

I'm running into an issue when trying to predict with the dn models. From what …I've researched it seems maybe related to this issue https://github.com/pytorch/pytorch/issues/973 from the pytorch forums and the workaround there was setting the number of workers to 0. If anybody else has encountered this or knows how to set the number of workers to 0, I tried setting num_workers on ImageClassifierData to 0, but that didn't solve the issue for me. I don't know if there is anything that can be done on the fastai side since it appears to be a pytorch problem, but I figured it's at least worth documenting and if anybody has any ideas they can look into it. ``` --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-13-c94a818ff72b> in <module>() 10 learn[i].fit(0.01, 3, cycle_len=1, cycle_mult=4) 11 ---> 12 test_predictions = learn[i].predict(is_test=True) 13 14 #tmp_log_preds,tmp_y = learn[i].TTA(is_test=True, n_aug=50) ~/fastaip1v2/fastai/courses/dl1/fastai/learner.py in predict(self, is_test) 136 self.load('tmp') 137 --> 138 def predict(self, is_test=False): return self.predict_with_targs(is_test)[0] 139 140 def predict_with_targs(self, is_test=False): ~/fastaip1v2/fastai/courses/dl1/fastai/learner.py in predict_with_targs(self, is_test) 140 def predict_with_targs(self, is_test=False): 141 dl = self.data.test_dl if is_test else self.data.val_dl --> 142 return predict_with_targs(self.model, dl) 143 144 def predict_dl(self, dl): return predict_with_targs(self.model, dl)[0] ~/fastaip1v2/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl) 115 if hasattr(m, 'reset'): m.reset() 116 preda,targa = zip(*[(get_prediction(m(*VV(x))),y) --> 117 for *x,y in iter(dl)]) 118 return to_np(torch.cat(preda)), to_np(torch.cat(targa)) 119 ~/fastaip1v2/fastai/courses/dl1/fastai/model.py in <listcomp>(.0) 114 m.eval() 115 if hasattr(m, 'reset'): m.reset() --> 116 preda,targa = zip(*[(get_prediction(m(*VV(x))),y) 117 for *x,y in iter(dl)]) 118 return to_np(torch.cat(preda)), to_np(torch.cat(targa)) ~/fastaip1v2/fastai/courses/dl1/fastai/dataset.py in __next__(self) 226 if self.i>=len(self.dl): raise StopIteration 227 self.i+=1 --> 228 return next(self.it) 229 230 @property ~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self) 193 while True: 194 assert (not self.shutdown and self.batches_outstanding > 0) --> 195 idx, batch = self.data_queue.get() 196 self.batches_outstanding -= 1 197 if idx != self.rcvd_idx: ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/queues.py in get(self) 335 res = self._reader.recv_bytes() 336 # unserialize the data after having released the lock --> 337 return _ForkingPickler.loads(res) 338 339 def put(self, obj): ~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size) 68 fd = multiprocessing.reduction.rebuild_handle(df) 69 else: ---> 70 fd = df.detach() 71 try: 72 storage = storage_from_cache(cls, fd_id(fd)) ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/resource_sharer.py in detach(self) 56 '''Get the fd. This should only be called once.''' 57 with _resource_sharer.get_connection(self._id) as conn: ---> 58 return reduction.recv_handle(conn) 59 60 ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recv_handle(conn) 180 '''Receive a handle over a local connection.''' 181 with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s: --> 182 return recvfds(s, 1)[0] 183 184 def DupFd(fd): ~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recvfds(sock, size) 159 if len(ancdata) != 1: 160 raise RuntimeError('received %d items of ancdata' % --> 161 len(ancdata)) 162 cmsg_level, cmsg_type, cmsg_data = ancdata[0] 163 if (cmsg_level == socket.SOL_SOCKET and RuntimeError: received 0 items of ancdata ```

From there I tried to use the hack of using -

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (2048, rlimit[1]))

But still it is showing the following errors-

> ----> 1 tta= learn.TTA(is_test=True)
>       2 classes = np.array(data.classes, dtype=str)
>       3 res = [" ".join(classes[np.where(pp > 0.2)]) for pp in tta[0]]
>       4 test_fnames = [os.path.basename(f).split(".")[0] for f in data.test_ds.fnames]
>       5 test_df = pd.DataFrame(res, index=test_fnames, columns=['tags'])
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/learner.py in TTA(self, n_aug, is_test)
>     167         preds1,targs = predict_with_targs(self.model, dl1)
>     168         preds1 = [preds1]*math.ceil(n_aug/4)
> --> 169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]
>     170         return np.stack(preds1+preds2).mean(0), targs
>     171 
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/learner.py in <listcomp>(.0)
>     167         preds1,targs = predict_with_targs(self.model, dl1)
>     168         preds1 = [preds1]*math.ceil(n_aug/4)
> --> 169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]
>     170         return np.stack(preds1+preds2).mean(0), targs
>     171 
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
>     115     if hasattr(m, 'reset'): m.reset()
>     116     res = []
> --> 117     for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
>     118     preda,targa = zip(*res)
>     119     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
> 
> ~/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py in __next__(self)
>     219         if self.i>=len(self.dl): raise StopIteration
>     220         self.i+=1
> --> 221         return next(self.it)
>     222 
>     223     @property
> 
> ~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
>     199                 self.reorder_dict[idx] = batch
>     200                 continue
> --> 201             return self._process_next_batch(batch)
>     202 
>     203     next = __next__  # Python 2 compatibility
> 
> ~/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
>     219         self._put_indices()
>     220         if isinstance(batch, ExceptionWrapper):
> --> 221             raise batch.exc_type(batch.exc_msg)
>     222         return batch
>     223 
> 
>  AttributeError: Traceback (most recent call last):
>   File "/home/nafizh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
>     samples = collate_fn([dataset[i] for i in batch_indices])
>   File "/home/nafizh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 40, in <listcomp>
>     samples = collate_fn([dataset[i] for i in batch_indices])
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py", line 94, in __getitem__
>     return self.get(self.transform, x, y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/dataset.py", line 99, in get
>     return (x,y) if tfm is None else tfm(x,y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 466, in __call__
>     def __call__(self, im, y=None): return compose(im, y, self.tfms)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 447, in compose
>     im, y =fn(im, y)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 231, in __call__
>     x,y = ((self.transform(x),y) if self.tfm_y==TfmType.NO
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 239, in transform
>     x = self.do_transform(x)
>   File "/home/nafizh/fast_ai_fellowship/fastai/courses/dl1/fastai/transforms.py", line 403, in do_transform
>     if self.rp: x = rotate_cv(x, self.rdeg, mode=self.mode)
> AttributeError: 'RandomRotateXY' object has no attribute 'mode'

I have the latest code from the fast-ai repo. Any suggestions on this?

vikbehal · November 20, 2017, 12:13am

learn.lr_find() failing with following erroe in lesson2 notebook: