Any Ideas on how to modify Resnet with Images that have 2 channels?

jamesrequa · November 13, 2017, 8:22pm

Thanks for the encouragement and I hope you are right

I have seen situations like this before with other kaggle comps and it usually results in a big private lb shakeup since a lot of people could be overfitting to the public lb. Trusting local cv is important in this case, but I also think I need to run a lot more experiments to be sure.

What kind of loss rates is everyone else getting on local cv? Are you all seeing the same discrepancy between validation vs test set loss rates?

jamesrequa · November 14, 2017, 12:37am

I was looking through some of the notebooks shared for iceberg competition and one thing I noticed is that the ids are not always aligned correctly with the test predictions. This is super important and for this particular competition its a little more tricky since the submission ids aren’t the same as the test image ids which are usually named automatically by their index.

Here is what I used to align the test ids with the test predictions for iceberg competition. Of course, anyone can correct me if I’m wrong about this!

test = pd.read_json(f'{PATH}test.json')
test_preds = np.exp(learn.TTA(is_test=True)[0])[:, 0]
test_idxs = [i.split('.jpg')[0].split('/')[-1] for i in data.test_dl.dataset.fnames]
test_ids_json = test['id']
test_ids = []
for i in test_idxs:
    test_ids.append(test_ids_json[int(i)])
test_set = pd.read_csv('data/iceberg/sample_submission.csv')
test_set['id'] = test_ids
test_set['is_iceberg'] = test_preds

jeremy · November 14, 2017, 1:00am

I just added an example to Lesson 3 In-Class Discussion . HTH!

jamesrequa · November 14, 2017, 1:11am

Great thanks! I was actually providing this code example for the iceberg competition as it is slightly different (and a bit more tricky) than the dog breed comp in which the test ids are provided in json and they need to be paired up with the correct test img idxs.

jeremy · November 14, 2017, 1:13am

I really like your idea of reading in the sample submission and then filling in the columns

kcturgutlu · November 16, 2017, 12:29am

Hi @mmr,

Because you are submitting without ordering. Kaggle expects you to submit with the same order of id’s and there is a format given in competition page you can check it out.

For example format should be something like;
id, is_iceberg
dasd, 0.6
kjdks, 0.7
…

In order to align preds and test submission you need to know either the index of the predicted file in test data or something else to align it correctly. What I do is, I save test data like ‘{index}.jpg’ so that later I can extract index and put it back to the desired order.

Hope this helps

btw: you can access fnames of predicted images from:

learn.data_.test_ds.fnames

kcturgutlu · November 16, 2017, 12:42am

learn.load('256_TTA')
preds = learn.TTA(is_test = True)

After running this I am geting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-26-dba07d0956ac> in <module>()
  1 learn.load('256_TTA')
  2 learn.data_.test_ds.get_n()
----> 3 preds = learn.TTA(is_test = True)

~/fastai/courses/dl1/fastai/learner.py in TTA(self, n_aug, is_test)
165         dl1 = self.data.test_dl     if is_test else self.data.val_dl
166         dl2 = self.data.test_aug_dl if is_test else self.data.aug_dl
--> 167         preds1,targs = predict_with_targs(self.model, dl1)
168         preds1 = [preds1]*math.ceil(n_aug/4)
169         preds2 = [predict_with_targs(self.model, dl2)[0] for i in range(n_aug)]

~/fastai/courses/dl1/fastai/model.py in predict_with_targs(m, dl)
115     if hasattr(m, 'reset'): m.reset()
116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
--> 117                         for *x,y in iter(dl)])
118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))
119 

~/fastai/courses/dl1/fastai/model.py in <listcomp>(.0)
114     m.eval()
115     if hasattr(m, 'reset'): m.reset()
--> 116     preda,targa = zip(*[(get_prediction(m(*VV(x))),y)
117                         for *x,y in iter(dl)])
118     return to_np(torch.cat(preda)), to_np(torch.cat(targa))

~/fastai/courses/dl1/fastai/dataset.py in __next__(self)
226         if self.i>=len(self.dl): raise StopIteration
227         self.i+=1
--> 228         return next(self.it)
229 
230     @property

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
193         while True:
194             assert (not self.shutdown and self.batches_outstanding > 0)
--> 195             idx, batch = self.data_queue.get()
196             self.batches_outstanding -= 1
197             if idx != self.rcvd_idx:

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/queues.py in get(self)
335             res = self._reader.recv_bytes()
336         # unserialize the data after having released the lock
--> 337         return _ForkingPickler.loads(res)
338 
339     def put(self, obj):

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/multiprocessing/reductions.py in rebuild_storage_fd(cls, df, size)
 68         fd = multiprocessing.reduction.rebuild_handle(df)
 69     else:
---> 70         fd = df.detach()
 71     try:
 72         storage = storage_from_cache(cls, fd_id(fd))

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
 56             '''Get the fd.  This should only be called once.'''
 57             with _resource_sharer.get_connection(self._id) as conn:
---> 58                 return reduction.recv_handle(conn)
 59 
 60 

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recv_handle(conn)
180         '''Receive a handle over a local connection.'''
181         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
--> 182             return recvfds(s, 1)[0]
183 
184     def DupFd(fd):

~/anaconda3/envs/fastai/lib/python3.6/multiprocessing/reduction.py in recvfds(sock, size)
159             if len(ancdata) != 1:
160                 raise RuntimeError('received %d items of ancdata' %
--> 161                                    len(ancdata))
162             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
163             if (cmsg_level == socket.SOL_SOCKET and

RuntimeError: received 0 items of ancdata

What might be the reason ?

hiromi · November 16, 2017, 12:44am

I wish I knew. I’ve been trying to fix that since yesterday

kcturgutlu · November 16, 2017, 12:50am

I will try deleting tmp and running the nb all over again, but it takes around 4 hours

jeremy · November 16, 2017, 12:52am

Do you have the same problem with 64x64 images?

kcturgutlu · November 16, 2017, 1:01am

I haven’t tried predicting with 64. I downloaded planet data myself maybe there is an issue with folder arrangments. Because I also had to add additional test-jpg in order to submit.

jamesrequa · November 16, 2017, 1:07am

Like Kerem mentioned, it looks like your test preds aren’t properly aligned with the test ids. I provided some sample code below which will take care of all of that for you specifically for the iceberg competition. Hope it helps!

ramesh · November 16, 2017, 1:28am

After your learn.load(), you may need to set_data like the one listed in - https://github.com/fastai/fastai/blob/master/courses/dl1/cifar10.ipynb

Try this instead of your learn.data_test_ds.get_n()?

I am not sure if this will help, but give it a shot.

kcturgutlu · November 16, 2017, 1:33am

Ok, thank you I will try this.

learn.data_.test_ds.get_n() is redundant code, I just used it to see if all my images are in there.

jeremy · November 16, 2017, 1:42am

That’s not necessary - it’s just there because I was gradually increasing the image size as I was training.

kcturgutlu · November 16, 2017, 2:32am

Unfortunately same case with 64x64 model, providing notebook:

https://github.com/KeremTurgutlu/deeplearning/blob/master/tmp-planet_comp.ipynb

Thank You

ramesh · November 16, 2017, 3:18am

I took your notebook and ran it on my AWS Machine and it ran OK. You can see it here - https://gist.github.com/sampathweb/f374df3055a1144041a4edaab3e1c453

My setup uses a docker / nvidia-docker instead of the AMI. So not sure if that for some strange reason resolves this issue. But Looks like it has an Active Github issue and being discussed there - https://github.com/pytorch/pytorch/issues/973, https://github.com/fastai/fastai/issues/23

ramesh · November 16, 2017, 3:25am

I tried the code that they say reproduces error and it worked fine on my machine. May be give it a try and confirm? -

import torch
import torch.multiprocessing as multiprocessing

def _worker_loop(data_queue, ):
    while True:
        t = torch.FloatTensor(1)
        data_queue.put(t)


def main():
    data_queue = multiprocessing.Queue(maxsize=1)
    p = multiprocessing.Process(
        target=_worker_loop,
        args=(data_queue,))

    p.daemon = True
    p.start()
    lis = []
    for i in range(10000):
        try:
            lis.append(data_queue.get())
        except:
            print('i = {}'.format(i))
            raise
            
main()

mmr · November 16, 2017, 3:53am

Thanks mate - it worked. The log loss is 0.22999. There are lots of tweaking I can do now.

ramesh · November 16, 2017, 3:22pm

@kcturgutlu - I was able to reproduce the error. Can you give this (scroll to the end of the issue) - https://github.com/fastai/fastai/issues/23