Diabetic Retinopathy Detection kaggle competition

davidc · April 18, 2017, 9:32am

Hi

In the Lesson 1 video, I think @jeremy mentioned that he used Vgg16 or similar to look at cancer MRIs or similar and got a better result than humans. In Video 2 there was a question about whether Vgg16 would work on cartoons and the answer was that it wouldn’t.

I am unclear about whether we would use ImageNet as the basis for MRIs or Diabetic Retinopathy leveraging the edge, corner, curves etc from the first few layers and then training the next few layers with training data from MRIs or is it that ImageNet is just unsutable for these new types of images.

If you can use ImageNet for new problems like this is it also possible to use it for non-image problems but representing the input data as images.

David

jeremy · April 19, 2017, 1:15am

I suspect it would help a little - but I’d be interested in hearing any results and comparisons you get if you give it a go!

davidc · April 24, 2017, 10:22pm

From notes in lesson 3

We typically don’t touch the convolutional layers, as we find that the filters typically still work well for almost any classification task that uses standard photos. Vision tasks using line art, medical imaging, or other domains very different to standard photos are likely to require retraining convolutional layers, however.

I now understand better I think that essentially it would be necessary to retrain most or all layers for medical imagery.

Surya501 · April 25, 2017, 7:13am

If my memory serves right, google used vgg trained on imagenet as a starting point for diabetic retinopathy work. Look for the video talk on this from the tensorflow dev day.

davidc · April 25, 2017, 10:01am

Thanks! Here’s a TensorFlow presentation for Diabetic Retinopathy where she mentions pretained Inception in ImageNet. Link to relevant part:

Surya501 · April 26, 2017, 5:05pm

David, I want to play with the same dataset. Would you be interested in collaborating? My thinking is this dataset would work well with Densenet, especially given the sample size and compute requirements for a larger images.

Surya501 · April 26, 2017, 5:47pm

As with anything, the best answer is to experiment. Especially so since experimentation is cheap. However, I’d like to have a hypothesis on what I should expect before experimenting. Here is my thinking on why cartoons are different from imagenet, but eye scans are not.

Background: We can represent any signal as a weighted sum of a select few signals. A intuitive way to think about it would be to think about currency. With a few denominations of coins and currency notes, we can pay any amount by combining a few of these. However, this analogy fails a bit as we could pay any amount using just a penny. Any other currency note/coin could effectively replaced with just pennies. A better analogy would be prime-factorization: Any number could be represented as a product of primes.

So extending that analogy to images, DCT/FFT/Wavelet is to images as primes are to numbers. Put differently, with a weighted linear summation of these, we should be able to reconstruct images. This notebook explains it much more visually http://bugra.github.io/work/notes/2014-07-12/discre-fourier-cosine-transform-dft-dct-image-compression/ . We know that we can reconstruct the original image from the first layer activations, so first learned layer in VGG is just a few known good linear filters that can approximate the original signal corpus. This first layer can be loosely thought of as ‘learned DCT/FFT/Wavelet basis functions’, except that we cant use negative summation (Relu makes negative coefficients to zero).

Hypothesis:
Imagenet is full of natural images and hence it would not have a lot of sharp edges. However, cartoons are full of sharp edges and flat colors. Hence, networks trained on imagenet would not be able to reconstruct cartoons well even from the first layer outputs. If you loose information right in the first layer, the network would not be good at working on these layers. (http://stackoverflow.com/questions/422902/image-compression-for-webcomics has more details on struggles with cartoons and jpeg compression.). Eye scans are continuous in nature and don’t have sharp edges like cartoons, hence should work well using networks trained on Imagenet.

Thoughts?

jeremy · April 26, 2017, 8:54pm

Yup that sounds about right to me. But I bet you could create something like imagenet that has lots of drawings, text, logos, etc, and use a model trained on that as your starting point.

s.s.o · April 26, 2017, 8:58pm

@Surya501 Well see the lecture 2 about 55th min - visualization of network layers. There, first layer captures diagonal, vertical and horizontal lines and some gradient background.So, I think it may capture sharp edges.

davidc · April 28, 2017, 11:07am

Seresh - I am currently slowly working my way through cats and dogs and state farm taking more than the alotted week per class because of time constraints. Once I understand better (and that in increasing every week with this amazing course that builds CNNs in Excel!). So I wont be able to contribute much in the near term but will come back to this later.

davidc · April 28, 2017, 11:09am

So given that the TF youtube link above says they used pretrained network Diabetic Retinopathy does that suggest that images of organic objects benifit from pretrained because they too have gradients, soft edges etc. Cartoons are very different.

Surya501 · April 28, 2017, 3:29pm

Yes.very good point. Even DCT when visualized is vertical and horizontal grids, but does not generalize well for cartoons as per the stackoverflow link. JPEG compresses cartoons, but not as efficiently. Perhaps it is also the flat color gradients in the cartoons.We should be able to try line drawings, clipart, vs cartoons to figure this out. One more thing to add to my list of items to try.

Surya501 · April 28, 2017, 3:35pm

I am confident of the first part, but I am not quite sure on why exactly cartoons don’t work yet. as I mentioned earlier, it should be possible to see where/why exactly it fails, but that is different topic than the thread.

s.s.o · April 29, 2017, 9:26am

I thing the lines are vanishing through the higher layers. DL learning layers at vgg especially last layers captures patterns but lines does not form patterns. May be you can try as preprocessing step; blurring and thicken the lines in the original image so they survive till last layer.