Human Protein Atlas Competition starter code

I logged a Pull Request in github to update the threshold from 0.5 to 0.2 . This is consistent with planet example we covered in course and also gets better results for the Human Protein data set. Small change that will make a big difference.

Also, the question was asked in lesson 3 on what normalization stats to use for new data sets and Jeremy indicated that with transfer learning best practice is that the same stats should be used as the pre-trained model was was trained on.

So perhaps consider using image-net normalisation stats instead of the actual data set specific stats for quicker results. If the model was trained from scratch I suspect yours would be better. The Human Protein images are so different however, I don’t think this will really make a difference.

Thanks for the code. Been trying to get a pytorch dataset to be used with fastaiv1 for a while now

Thanks for the pull request! I just merged it. I’m curious–with the change to 0.2, do you know what were you able to get on public leaderboard?

I played around with a few things so not sure what improvement the change to 0.2 threshold in isolation is.

I got f_score of 0.4 and if I have to guess think it is responsible for 0.02 -.0.04 uptick, so 5-10%. This value needs to be refined again once model is trained as I have seen it shift in previous exercises.

Thank you for sharing. To continue this thread, I’d like to share my baseline for this competition too. Which is mainly inspired by planet notebook from lesson 3.
My way was to simply combine 3 1-channel images (discarding yellow images) to 1 RGB 3-channel image with greate tool imagemagick

Combining images would be as simple as this command in linux shell:

convert r.jpg g.jpg b.jpg -channel RGB -combine combined.jpg

So workflow would assume you have a combined images in your train and test folders.
This notebook would not get you on top of LB, but provide a nice and simple way to start playing with data and experimenting further.
And here is a link for Kaggle discussion topic.

Tnanks to fastai team and community.

1 Like

Hi @wdhorton

Great model!
I’ve tried to address the same competition, but even using transfer learning on Y channel too, the model seems to overfit without reaching your score…
Comparing the two models it seems that you’re using a model way more complex than resnet50: it seems to mix two copies of resnet50 one with first convolution with 3-channel images and one with 4-channels images as input. Why do you do that?

Here is my work:

@wdhorton why is the size = 224 and would it be possible to amend the to change the size up or down?

The size 224 is passed into the databunch create method. I chose it because it’s what resnet50 was originally trained on. But I’ve also done 128, 256, and 512 with this same notebook. You don’t have to modify, just change the size number when you create the databunch.

I’m not using two copies of resnet, though I can see how it might look like that. I initialize a single pretrained resnet50 as the encoder variable. Then I have to modify the first convolutional layer, so I make a new Conv2D, but copy over some of the weights from the pretrained model. The rest of the model is exactly like resnet50 if you look at the code in torchvision.models. In the forward pass I don’t call encoder (resnet50) directly, I just use the layers of it that I added to self in the init method.


For some reason changing the size results in error when I start to train the model

RuntimeError: Given input size: (2048x2x2). Calculated output size: (2048x-4x-4). Output size is too small at /opt/conda/conda-bld/pytorch-nightly_1540201584778/work/aten/src/THCUNN/generic/

Only change I made to your code is the size to 128 from 224 when I create the data bunch.

Ok, I think I know the issue. I’ve got a couple fixes I’m going to get out in the next few days (including migrating to datablock API), I’ll keep you updated.

Update: I made another notebook to work with the new data_block API. In this version, I also made changes to use the create_cnn function. You can find it at resnet50_basic_datablocks.ipynb. @chrisoos this should fix the issue you ran into too (caused by saving the encoder as self.encoder when it wasn’t needed).


Just took a look at your Wow, you can modify a pretrained network like that? That’s…cool. :no_mouth:

Hmm, wonder if there’s a way to freeze channels within a layer.

1 Like

@wdhorton, thanks for making the changes!

I tested your started code and it loads and trains data successfully. I’m also able to change bs and sz as advertised. There is however an issue when it comes to the inference part.

When running:

preds,_ = learn.get_preds(DatasetType.Test)

The following index error spits out for the test ds

IndexError: Traceback (most recent call last):
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/", line 354, in __getitem__
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/", line 79, in __getitem__
    if isinstance(try_int(idxs), int): return self.get(idxs)
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/", line 228, in get
    o = super().get(i)
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/", line 55, in get
    item = self.items[i]
IndexError: index 6214 is out of bounds for axis 0 with size 6214

Are you getting the same error?

This appears to be as a result of

src.add_test(test_fnames, label=‘0’)

Which creates a test data set with 6,214 labels (y) and 11,702 (x) resulting in the index issue. Coincidentally, the validation size is also equal to 6,214 which appears to anchor the test label set size somehow.
is created.

Hi @chrisoos @wdhorton I was also getting errors on the same line, but it will not load the data, getting the following:

FileNotFoundError Traceback (most recent call last)
/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/core/ in call(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
–> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()

/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/lib/ in pretty(self, obj)
400 if cls is not object
401 and callable(cls.dict.get(‘repr’)):
–> 402 return _repr_pprint(obj, self, cycle)
404 return _default_pprint(obj, self, cycle)

/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/lib/ in repr_pprint(obj, p, cycle)
695 “”“A pprint that just redirects to the normal repr function.”""
696 # Find newlines and replace them with p.break
–> 697 output = repr(obj)
698 for idx,output_line in enumerate(output.splitlines()):
699 if idx:

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/ in repr(self)
305 def repr(self)->str:
–> 306 return f’{};\nTrain: {self.train};\nValid: {self.valid};\nTest: {self.test}’
308 def getattr(self, k):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/ in repr(self)
394 def clear_item(self): self.item = None
395 def repr(self)->str:
–> 396 x = f’{self.x}’ # force this to happen first
397 return f’{}\ny: {self.y}\nx: {x}’
398 def predict(self, res): return self.y.predict(res)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/ in repr(self)
52 def get(self, i)->Any: return self.items[i]
53 def repr(self)->str:
—> 54 items = [self[i] for i in range(min(5,len(self.items)))]
55 return f’{} ({len(self)} items)\n{items}…\nPath: {self.path}’

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/ in (.0)
52 def get(self, i)->Any: return self.items[i]
53 def repr(self)->str:
—> 54 items = [self[i] for i in range(min(5,len(self.items)))]
55 return f’{} ({len(self)} items)\n{items}…\nPath: {self.path}’

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/ in getitem(self, idxs)
81 def getitem(self,idxs:int)->Any:
—> 82 if isinstance(try_int(idxs), int): return self.get(idxs)
83 else: return[idxs], xtra=index_row(self.xtra, idxs))

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/ in get(self, i)
288 def get(self, i):
289 fn = super().get(i)
–> 290 res =
291 self.sizes[i] = res.size
292 return res

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/ in open(self, fn)
284 self.sizes={}
–> 286 def open(self, fn): return open_image(fn)
288 def get(self, i):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/ in open_image(fn, div, convert_mode, cls)
440 “Return Image object created from image in file fn.”
441 #fn = getattr(fn, ‘path’, fn)
–> 442 x =
443 x = pil2tensor(x,np.float32)
444 if div: x.div_(255)

/opt/conda/envs/fastai/lib/python3.6/site-packages/PIL/ in open(fp, mode)
2608 if filename:
-> 2609 fp =, “rb”)
2610 exclusive_fp = True

FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/human-protein-atlas-image-classification/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0.png’

I ran into that issue and fixed it with this PR (, which was merged to master 6 days ago. Try installing the latest fastai and the problem should go away.

I think to fix this issue I just need to add a ; to the end of the line so it doesn’t try to print. I’ll add a fix in tonight.

Hey thanks @wdhorton didn’t realize it was so simple, will try again later tonight.

Updated fastai to v 1.0.28 - noticed your notes reference 1.0.29 is the latest version? I’m not able to pip install fastai --upgrade to this version.

Getting a new error.
On calling add_test the train folder (from src.valid I think) is used instead of the path specified in the test_fnames

src.add_test(test_fnames, label=‘0’)

FileNotFoundError: [Errno 2] No such file or directory: '../../data/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0.png'

I’ve been installing from the fastai source code, so you might not be able to get 1.0.29 on pip yet.

This commit ( should’ve fixed the error with add_test, so I’d suggest pulling the latest from my starter code repo and seeing if that works.