Cancer Data


(Omar Amin) #1

Have anyone worked before on medical images? where each image can be of 2 GB size?

what’re the HW requirements for such cases? can a normal deep learning box handle this problem?

I might work on a problem where data size is around 30 TB, each image is around 2 GB, my plan is to resize those images to fit in normal problems that we work with? anyother ideas?

Thanks


(Alex) #2

out of curiosity what are properties of these images? dimensions, compression?


(Alexandre Cadrin-Chênevert) #3

Being a radiologist, I mostly work on medical images. There are tons of interesting challenges working with these. But the potential impact is huge. We are still in the very early days of application of deep learning to medical imaging.

Globally, high resolution, single channel, noise, weak labeling and/or low number of samples are the frequent challenges when applying deep learning to medical imaging. What is really exciting for the algorithmic dl research in medical imaging is that each problem has usually a different optimal solution.

To answer your specific question about resizing, it depends on the context. If the computer vision problem that you are trying to solve doesn’t really need high resolution, then it should be fine to resize. But if you try to solve a problem that usually needs high resolution (eg. identifying microcalcifications on a digital mammography) than resizing can completely broke the potential performance. Involvement from an interested domain expert (radiologist) can usually help to get a hint for a useful direction.


(Jeremy Howard) #4

…and the best way to figure out what resolution you need, is to resize images to a few different resolutions, and see if a human expert can classify them correctly.


(Alexandre Cadrin-Chênevert) #5

I totally agree with @jeremy. And, most of the time, if a human expert can’t classify well with low resolution and your model can, suspect an unrelated bias.

Here is a well known example, with a great machine learning methodology, using 224x224 training for chest xrays, of potential unrelated bias in the dataset that helped to make the predictions : https://arxiv.org/pdf/1711.05225.pdf

Of course, I meant implicitely, that if I try to interpret chest X rays on 224x224 images, I’ll pass more time in the lawyer office than in my office.