Any Ideas on how to modify Resnet with Images that have 2 channels?

Very nice! We had been putting our bets on pretrained networks with good tuning with @groverpr and @shik1470. After a lot trial and errors, @groverpr got ~0.20 on LB with resnet18. Now we are working on replicating the results to see what works on this data:

Things to consider further:

  • Other architectures that worked well on CIFAR-10
  • Semi Supervised Learning
  • Snapshot Ensembling
2 Likes

This iceberg comp is a bit strange tho cause I can get 0.13 validation loss yet the loss on test set is over 0.2. Using DenseNet.

I think 0.20 is very good and did you consider the fact that 5000 images in test data are machine generated. So trusting your validation score might be a good thing, which means that you are probably better than your LB
:slight_smile:

1 Like

Thanks for the encouragement and I hope you are right :slight_smile:

I have seen situations like this before with other kaggle comps and it usually results in a big private lb shakeup since a lot of people could be overfitting to the public lb. Trusting local cv is important in this case, but I also think I need to run a lot more experiments to be sure.

What kind of loss rates is everyone else getting on local cv? Are you all seeing the same discrepancy between validation vs test set loss rates?

I was looking through some of the notebooks shared for iceberg competition and one thing I noticed is that the ids are not always aligned correctly with the test predictions. This is super important and for this particular competition its a little more tricky since the submission ids aren’t the same as the test image ids which are usually named automatically by their index.

Here is what I used to align the test ids with the test predictions for iceberg competition. Of course, anyone can correct me if I’m wrong about this!

test = pd.read_json(f'{PATH}test.json')
test_preds = np.exp(learn.TTA(is_test=True)[0])[:, 0]
test_idxs = [i.split('.jpg')[0].split('/')[-1] for i in data.test_dl.dataset.fnames]
test_ids_json = test['id']
test_ids = []
for i in test_idxs:
    test_ids.append(test_ids_json[int(i)])
test_set = pd.read_csv('data/iceberg/sample_submission.csv')
test_set['id'] = test_ids
test_set['is_iceberg'] = test_preds
1 Like

I just added an example to Lesson 3 In-Class Discussion . HTH!

2 Likes

Great thanks! I was actually providing this code example for the iceberg competition as it is slightly different (and a bit more tricky) than the dog breed comp in which the test ids are provided in json and they need to be paired up with the correct test img idxs.

1 Like

I really like your idea of reading in the sample submission and then filling in the columns :slight_smile:

1 Like

Hi @mmr,

Because you are submitting without ordering. Kaggle expects you to submit with the same order of id’s and there is a format given in competition page you can check it out.

For example format should be something like;
id, is_iceberg
dasd, 0.6
kjdks, 0.7

In order to align preds and test submission you need to know either the index of the predicted file in test data or something else to align it correctly. What I do is, I save test data like ‘{index}.jpg’ so that later I can extract index and put it back to the desired order.

Hope this helps

btw: you can access fnames of predicted images from:

learn.data_.test_ds.fnames
learn.load('256_TTA')
preds = learn.TTA(is_test = True)

After running this I am geting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-26-dba07d0956ac> in <module>()
  1 learn.load('256_TTA')
  2 learn.data_.test_ds.get_n()
----> 3 preds = learn.TTA(is_test = True)

~/fastai/courses/dl1/fastai/learner.py in TTA(self, n_aug, is_test)
165         dl1 = self.data.test_dl     if is_test else self.data.val_dl
166         dl2 = self.data.test_aug_dl if is_test else self.data.aug_dl
--> 167         preds1,targs = predict_with_targs(self.model, dl1)
168         preds1 = [preds1]*math.ceil(n_aug/4)
169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in range(n_aug)]

~/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
115     if hasattr(m, 'reset'): m.reset()
116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
--> 117                         for *x,y in iter(dl)])
118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
119 

~/fastai/courses/dl1/fastai/model.py in <listcomp>(.0)
114     m.eval()
115     if hasattr(m, 'reset'): m.reset()
--> 116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
117                         for *x,y in iter(dl)])
118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))

~/fastai/courses/dl1/fastai/dataset.py in __next__(self)
226         if self.i>=len(self.dl): raise StopIteration
227         self.i+=1
--> 228         return next(self.it)
229 
230     @property

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
193         while True:
194             assert (not self.shutdown and self.batches_outstanding > 0)
--> 195             idx, batch = self.data_queue.get()
196             self.batches_outstanding -= 1
197             if idx != self.rcvd_idx:

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/queues.py in get(self)
335             res = self._reader.recv_bytes()
336         # unserialize the data after having released the lock
--> 337         return _ForkingPickler.loads(res)
338 
339     def put(self, obj):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size)
 68         fd = multiprocessing.reduction.rebuild_handle(df)
 69     else:
---> 70         fd = df.detach()
 71     try:
 72         storage = storage_from_cache(cls, fd_id(fd))

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
 56             '''Get the fd.  This should only be called once.'''
 57             with _resource_sharer.get_connection(self._id) as conn:
---> 58                 return reduction.recv_handle(conn)
 59 
 60 

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recv_handle(conn)
180         '''Receive a handle over a local connection.'''
181         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
--> 182             return recvfds(s, 1)[0]
183 
184     def DupFd(fd):

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recvfds(sock, size)
159             if len(ancdata) != 1:
160                 raise RuntimeError('received %d items of ancdata' %
--> 161                                    len(ancdata))
162             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
163             if (cmsg_level == socket.SOL_SOCKET and

RuntimeError: received 0 items of ancdata

What might be the reason ?

I wish I knew. I’ve been trying to fix that since yesterday :frowning:

1 Like

I will try deleting tmp and running the nb all over again, but it takes around 4 hours :slight_smile:

Do you have the same problem with 64x64 images?

I haven’t tried predicting with 64. I downloaded planet data myself maybe there is an issue with folder arrangments. Because I also had to add additional test-jpg in order to submit.

Like Kerem mentioned, it looks like your test preds aren’t properly aligned with the test ids. I provided some sample code below which will take care of all of that for you specifically for the iceberg competition. Hope it helps!

After your learn.load(), you may need to set_data like the one listed in - https://github.com/fastai/fastai/blob/master/courses/dl1/cifar10.ipynb

Try this instead of your learn.data_test_ds.get_n()?

I am not sure if this will help, but give it a shot.

Ok, thank you I will try this.

learn.data_.test_ds.get_n() is redundant code, I just used it to see if all my images are in there.

1 Like

That’s not necessary - it’s just there because I was gradually increasing the image size as I was training.

Unfortunately same case with 64x64 model, providing notebook:

https://github.com/KeremTurgutlu/deeplearning/blob/master/tmp-planet_comp.ipynb

Thank You

I took your notebook and ran it on my AWS Machine and it ran OK. You can see it here - https://gist.github.com/sampathweb/f374df3055a1144041a4edaab3e1c453

My setup uses a docker / nvidia-docker instead of the AMI. So not sure if that for some strange reason resolves this issue. But Looks like it has an Active Github issue and being discussed there - https://github.com/pytorch/pytorch/issues/973, https://github.com/fastai/fastai/issues/23