I entered the Human Protein Atlas Image competition on Kaggle to practice what I learned so far in Part 1. The competition seemed straight forward and similar to Lesson 3 planet notebook. We have 28 labels for each image corresponding to the cellular location of a particular protein.
The problem is that each sample is split into 4 files, one for each channel of the image. In addition, the evaluation section says only the green channel is used for prediction. Based on this I trained a model using only the files corresponding to the green channel and voilà: my F1 score was extremely low.
So my question is: if we are only going to use one channel for prediction, why can’t we train a cnn_learner using only green channel files? I plotted the images for other channels and they do show different properties, which makes me think that I probably not considering extra information they could give to improve the model.
Link to competition: Human Protein Atlas Image Competition