In the Lesson 1 video, I think @jeremy mentioned that he used Vgg16 or similar to look at cancer MRIs or similar and got a better result than humans. In Video 2 there was a question about whether Vgg16 would work on cartoons and the answer was that it wouldn’t.
I am unclear about whether we would use ImageNet as the basis for MRIs or Diabetic Retinopathy leveraging the edge, corner, curves etc from the first few layers and then training the next few layers with training data from MRIs or is it that ImageNet is just unsutable for these new types of images.
If you can use ImageNet for new problems like this is it also possible to use it for non-image problems but representing the input data as images.
We typically don’t touch the convolutional layers, as we find that the filters typically still work well for almost any classification task that uses standard photos. Vision tasks using line art, medical imaging, or other domains very different to standard photos are likely to require retraining convolutional layers, however.
I now understand better I think that essentially it would be necessary to retrain most or all layers for medical imagery.
David, I want to play with the same dataset. Would you be interested in collaborating? My thinking is this dataset would work well with Densenet, especially given the sample size and compute requirements for a larger images.
As with anything, the best answer is to experiment. Especially so since experimentation is cheap. However, I’d like to have a hypothesis on what I should expect before experimenting. Here is my thinking on why cartoons are different from imagenet, but eye scans are not.
Background: We can represent any signal as a weighted sum of a select few signals. A intuitive way to think about it would be to think about currency. With a few denominations of coins and currency notes, we can pay any amount by combining a few of these. However, this analogy fails a bit as we could pay any amount using just a penny. Any other currency note/coin could effectively replaced with just pennies. A better analogy would be prime-factorization: Any number could be represented as a product of primes.
So extending that analogy to images, DCT/FFT/Wavelet is to images as primes are to numbers. Put differently, with a weighted linear summation of these, we should be able to reconstruct images. This notebook explains it much more visually http://bugra.github.io/work/notes/2014-07-12/discre-fourier-cosine-transform-dft-dct-image-compression/ . We know that we can reconstruct the original image from the first layer activations, so first learned layer in VGG is just a few known good linear filters that can approximate the original signal corpus. This first layer can be loosely thought of as ‘learned DCT/FFT/Wavelet basis functions’, except that we cant use negative summation (Relu makes negative coefficients to zero).
Imagenet is full of natural images and hence it would not have a lot of sharp edges. However, cartoons are full of sharp edges and flat colors. Hence, networks trained on imagenet would not be able to reconstruct cartoons well even from the first layer outputs. If you loose information right in the first layer, the network would not be good at working on these layers. (http://stackoverflow.com/questions/422902/image-compression-for-webcomics has more details on struggles with cartoons and jpeg compression.). Eye scans are continuous in nature and don’t have sharp edges like cartoons, hence should work well using networks trained on Imagenet.
@Surya501 Well see the lecture 2 about 55th min - visualization of network layers. There, first layer captures diagonal, vertical and horizontal lines and some gradient background.So, I think it may capture sharp edges.
Seresh - I am currently slowly working my way through cats and dogs and state farm taking more than the alotted week per class because of time constraints. Once I understand better (and that in increasing every week with this amazing course that builds CNNs in Excel!). So I wont be able to contribute much in the near term but will come back to this later.
So given that the TF youtube link above says they used pretrained network Diabetic Retinopathy does that suggest that images of organic objects benifit from pretrained because they too have gradients, soft edges etc. Cartoons are very different.
Yes.very good point. Even DCT when visualized is vertical and horizontal grids, but does not generalize well for cartoons as per the stackoverflow link. JPEG compresses cartoons, but not as efficiently. Perhaps it is also the flat color gradients in the cartoons.We should be able to try line drawings, clipart, vs cartoons to figure this out. One more thing to add to my list of items to try.
I am confident of the first part, but I am not quite sure on why exactly cartoons don’t work yet. as I mentioned earlier, it should be possible to see where/why exactly it fails, but that is different topic than the thread.
I thing the lines are vanishing through the higher layers. DL learning layers at vgg especially last layers captures patterns but lines does not form patterns. May be you can try as preprocessing step; blurring and thicken the lines in the original image so they survive till last layer.