Human Protein Atlas Competition starter code

Just took a look at your resnet.py. Wow, you can modify a pretrained network like that? That’s…cool. :no_mouth:

Hmm, wonder if there’s a way to freeze channels within a layer.

1 Like

@wdhorton, thanks for making the changes!

I tested your started code and it loads and trains data successfully. I’m also able to change bs and sz as advertised. There is however an issue when it comes to the inference part.

When running:

preds,_ = learn.get_preds(DatasetType.Test)

The following index error spits out for the test ds

IndexError: Traceback (most recent call last):
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py", line 354, in __getitem__
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py", line 79, in __getitem__
    if isinstance(try_int(idxs), int): return self.get(idxs)
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py", line 228, in get
    o = super().get(i)
  File "/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py", line 55, in get
    item = self.items[i]
IndexError: index 6214 is out of bounds for axis 0 with size 6214

Are you getting the same error?

This appears to be as a result of

src.add_test(test_fnames, label=‘0’)

Which creates a test data set with 6,214 labels (y) and 11,702 (x) resulting in the index issue. Coincidentally, the validation size is also equal to 6,214 which appears to anchor the test label set size somehow.
is created.

Hi @chrisoos @wdhorton I was also getting errors on the same line, but it will not load the data, getting the following:


FileNotFoundError Traceback (most recent call last)
/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/core/formatters.py in call(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
–> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()

/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
400 if cls is not object
401 and callable(cls.dict.get(‘repr’)):
–> 402 return _repr_pprint(obj, self, cycle)
403
404 return _default_pprint(obj, self, cycle)

/opt/conda/envs/fastai/lib/python3.6/site-packages/IPython/lib/pretty.py in repr_pprint(obj, p, cycle)
695 “”“A pprint that just redirects to the normal repr function.”""
696 # Find newlines and replace them with p.break
()
–> 697 output = repr(obj)
698 for idx,output_line in enumerate(output.splitlines()):
699 if idx:

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in repr(self)
304
305 def repr(self)->str:
–> 306 return f’{self.class.name};\nTrain: {self.train};\nValid: {self.valid};\nTest: {self.test}’
307
308 def getattr(self, k):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in repr(self)
394 def clear_item(self): self.item = None
395 def repr(self)->str:
–> 396 x = f’{self.x}’ # force this to happen first
397 return f’{self.class.name}\ny: {self.y}\nx: {x}’
398 def predict(self, res): return self.y.predict(res)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in repr(self)
52 def get(self, i)->Any: return self.items[i]
53 def repr(self)->str:
—> 54 items = [self[i] for i in range(min(5,len(self.items)))]
55 return f’{self.class.name} ({len(self)} items)\n{items}…\nPath: {self.path}’
56

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in (.0)
52 def get(self, i)->Any: return self.items[i]
53 def repr(self)->str:
—> 54 items = [self[i] for i in range(min(5,len(self.items)))]
55 return f’{self.class.name} ({len(self)} items)\n{items}…\nPath: {self.path}’
56

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in getitem(self, idxs)
80
81 def getitem(self,idxs:int)->Any:
—> 82 if isinstance(try_int(idxs), int): return self.get(idxs)
83 else: return self.new(self.items[idxs], xtra=index_row(self.xtra, idxs))
84

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in get(self, i)
288 def get(self, i):
289 fn = super().get(i)
–> 290 res = self.open(fn)
291 self.sizes[i] = res.size
292 return res

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in open(self, fn)
284 self.sizes={}
285
–> 286 def open(self, fn): return open_image(fn)
287
288 def get(self, i):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/image.py in open_image(fn, div, convert_mode, cls)
440 “Return Image object created from image in file fn.”
441 #fn = getattr(fn, ‘path’, fn)
–> 442 x = PIL.Image.open(fn).convert(convert_mode)
443 x = pil2tensor(x,np.float32)
444 if div: x.div_(255)

/opt/conda/envs/fastai/lib/python3.6/site-packages/PIL/Image.py in open(fp, mode)
2607
2608 if filename:
-> 2609 fp = builtins.open(filename, “rb”)
2610 exclusive_fp = True
2611

FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/human-protein-atlas-image-classification/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0.png’

I ran into that issue and fixed it with this PR (https://github.com/fastai/fastai/pull/1160), which was merged to master 6 days ago. Try installing the latest fastai and the problem should go away.

I think to fix this issue I just need to add a ; to the end of the line so it doesn’t try to print. I’ll add a fix in tonight.

Hey thanks @wdhorton didn’t realize it was so simple, will try again later tonight.

Updated fastai to v 1.0.28 - noticed your notes reference 1.0.29 is the latest version? I’m not able to pip install fastai --upgrade to this version.

Getting a new error.
On calling add_test the train folder (from src.valid I think) is used instead of the path specified in the test_fnames

src.add_test(test_fnames, label=‘0’)

FileNotFoundError: [Errno 2] No such file or directory: '../../data/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0.png'

I’ve been installing from the fastai source code, so you might not be able to get 1.0.29 on pip yet.

This commit (https://github.com/wdhorton/protein-atlas-fastai/commit/b3e09bec34aa7efed1fab14708422fc6cba67202) should’ve fixed the error with add_test, so I’d suggest pulling the latest from my starter code repo and seeing if that works.

@wdhorton Why src.add_test(test_fnames, label='0'); (with semicolon) is working but this src.add_test(test_fnames, label='0') isn’t? (without semicolon)

It’s because of behavior in the Jupyter notebook: by default, the notebook prints the last line of each cell. Since I overrode certain things in src, it can’t print it without the error you see. Adding the semicolon at the end suppresses the printing, so there’s no error.

Thank you @wdhorton I pulled your latest starter repo again and all is working. Appreciate very much. I was reading some of the discussions the competition on using external data sets. Just wondering your thoughts if these would help?

1 Like

I saw that someone had an external source that doubled the size of the training set. I do think that’ll help, I’m going to try downloading it.

More data is always more helpful and this is allowed by the rules as long as it is publicly available.

Yes, thanks @chrisoos @wdhorton once I make some progress with your stater code I’ll try downloading the public HPA data sets which seems could be the best.

Hi @wdhorton , I was able to run the code all the way through to create stage-2-rn50 model. But the predictions (learn.get_preds(DatasetType.Test) now give me an errors ->

x = PIL.Image.open(fn).convert(convert_mode)
File “/opt/conda/envs/fastai/lib/python3.6/site-packages/PIL/Image.py”, line 2609, in open
fp = builtins.open(filename, “rb”)
FileNotFoundError: [Errno 2] No such file or directory: ‘/storage/human-protein-atlas-image-classification/test/00008af0-bad0-11e8-b2b8-ac1f6b6435d0’

It seems it cannot find the test images. I thought src.add_test(test_fnames, label=‘0’) was working but its giving me same error. I will start another test, let you know results.

@wdhorton just re-cloned the repo and ran everything again. I do not get error message on the src.add_test with the semi-colon. But get the error message above when on running the predictions. It seems it cannot load/find the test files. Not sure what I should try next.

Ok, I’ll see if I can take a look. Can you let me know which fastai version you’re on?

Version 1.0.28 thanks @wdhorton

@Jay2020 I just cloned the repo and ran it with fastai 1.0.28 and unfortunately I can’t reproduce the error. Based on the error message you’re seeing, I’d say this:

  1. Make sure you have this commit in your local version of my protein-atlas-fastai repo: https://github.com/wdhorton/protein-atlas-fastai/commit/fa75d39314e3f1bb4b578866e6d8c90057869974
    (You can run git log | grep fa75d39314e3f1bb4b578866e6d8c90057869974 within the repo and see if it prints anything).

  2. Make sure these lines are getting run to set the right open function on your ImageItemList:

src.test.x.create_func = open_4_channel
src.test.x.open = open_4_channel
  1. Run these lines to troubleshoot and make sure the functions are getting set right. If it’s not open_4_channel then something is going wrong.

Yep! Yesterday I tried to replicate the repo on version 1.0.28 and it is working fine for me! :slight_smile: @Jay2020