Transfer Learning for Medical Radiography

Hi everyone,

I’m a radiologist in Canada, relatively new to the deep learning world. I’m leading a group working on detecting manifestations of osteoporosis on images of the spine in the elderly. We have a good labeled dataset, but I think transfer learning with a previously successful model would be helpful. Searching this forum and elsewhere, I get the sense that an imagenet-trained model probably wouldn’t have ideal filters for greyscale medical images, and I’m not finding much else to go on. There is a paper published using imagenet converted to greyscale for pretraining, and I think they also tried selecting categories of images that seemed more similar to the features of radiography, so this will probably be our first approach.

My first question is: does anyone know of any available very large datasets or pretrained models that would be useful for transfer learning in medical radiography? If so, I would love to hear about them.

My second question is: would it be feasible and useful to create our own pretrained model from a similar but much larger dataset with simple labels and then finetune it on the labeled dataset of interest?

My radiology group includes a very busy orthopedic and sports medicine practice with hundreds of thousands (I think 500 000 is probably a conservative estimate) of high quality digital radiographs of all parts of the body (but primarily extremities). The storage system knows what body part these come from, and also patient age would be very readily accessible. We might be able to data mine the electronic patient record for a few other easily accessible labels (which patients were sent to the cast clinic for treatment of a fracture seems like an easy one), but that is a separate system that might be harder to access. I’m wondering if the convolutional layers from good network architectures trained on easy to obtain labels, esp. body part and approximate patient age category (which would hopefully capture developmental and degenerative changes), might be useful for all kinds of transfer learning in the field down the road.

If this was something that seems likely to succeed and be useful to the broader community, I would be willing to spearhead a project that would make it happen (get the ethics and privacy approvals, grant funding to put together a multi-gpu server and the like, but probably needing some technical help to make sure things were being done at a state-of-the-art level) and publish the result, then making a set of pretrained networks openly available to the world. Does this seem like something others might be interested in?

Thanks for reading,



Hello Sheldon,

I am also a canadian radiologist (province of Quebec) and computer engineer, starting research using deep learning. That is good to know your interest !

I would first try to copy the grayscale channel in the RGB channels and just try different pretrained models (VGG16, Inception V3, Xception, Resnet50) with different images resolution to see the result. Depending on the results, this will give you hints on where to go after.

Your point about using easy known labels as pretraining is an excellent point. I am trying a similar idea with cerebral CT ! Basically, the intuition is that a model needs to learn anatomy before learning pathology ! I totally agree with your hypothesis !

I am actually in the process of getting the approvals for data access. We basically want to follow the same path. So that could be a nice opportunity to collaborate and share our experience in this area.


Hi Alexandre,

Thanks for the advice, I’m glad to hear that you have had similar ideas!

I’m very interested to hear how things go with your head CT model. This is something I was hoping to work on next. I do primarily neuroradiology, so if there is anything I can do to help, please let me know!


Ethical data access is the key in these kind of projects. Strong de-identification is very important to get approval.

After the primary local approval and proof of concept, the next logical step is to share the data between different centers to scale up the training dataset. A stepwise approach is consequently needed for potential collaboration. A second step approval to share the data between collaborative centers could then be implemented. I see this as a snowball effect or bottom-top approach. Ideally, this snowball never stops to grow. A top-bottom approach for data access and collaboration isn’t impossible but is politically very risky.

I think you should try to get access as an exploratory study. Let`s keep contact and keep track of the evolution of both projects. Do you have some experience in coding or have access to a decent coder ?

Even if this is a dedicated answer, I am posting this publicly on purpose ! If anyone is interested to follow a similar path, the snowball could just grow faster.

1 Like

I’m a software developer based in Australia, i’d be interested in getting involved.

Does anyone have any ideas about or experience with using large datasets with simple labels as a source for transfer learning? Is there any successful precedent for this? I ask because imagenet-trained models are trained with a very large number of categories. Does this affect the quality of the low level convolutional filters? I can’t really see intuitively why it would, but I wanted to make sure before committing a lot of time and resources to a project like this.

Have you seen this paper, Compression Fractures Detection on CT. Their data set is only 3701 individuals

1 Like

There are some interesting Kaggle competitions with medical image datasets, winner solutions, and sample code. I’m not sure if these images are similar to yours or what the privacy/license requirements are, but could be a good place to start/prototype something. Transfer learning is an incredibly powerful technique, and definitely a good place to start, but increasingly I’m seeing more SOTA solutions to niche problems where the authors train from scratch on much smaller datasets.

Here is an interesting group of MRI datasets.



Thanks, that’s an interesting project that has some similarities to ours. The filters from the patch CNN classifier could potentially be reused.

Thanks for the suggestions! Unfortunately, these datasets are qualitatively quite different from ours.
I understand that autoencoder pretraining has fallen out of favor (and I can understand why) but I’m wondering if there might be a way to use a self-supervised toy task approach similar to that mentioned here: “”. I’m wondering if the toy task might enable us to break the dataset into a much larger patch dataset and ultimately do a better job of training the filters. Any thoughts on that?

It sounds like a very interesting project. Our experience so far with transfer learning from ImageNet models to large medical images has been very poor. You lose too much information going from high-resolution grayscale to low-resolution RGB. There are certainly some applications where it makes sense, but it doesn’t seem like radiography has many of them. I referenced some of the approaches we tried for CT and transfer learning in Black and white images on VGG16

If you look at models for segmentation like U-NET, they were trained entirely with medical / histological data and perform quite well.

As with most of these new areas, the only way to really know is to test it out. It would probably be a good idea to define a project or even make a test dataset (Kaggle is a great place to upload them and get feedback from others, Since you mentioned age and body part, it might be an interesting challenge to see if training a CNN to classify a large number of different body parts, makes it better at estimating age of a specific body part.

1 Like

Hi, coming back after a few months. Do you have any updates on the project? I am in a very similar situation and was wondering how it went for you. Thanks

I tried transfer learning with the NIH chest x ray dataset, but it wasn’t useful for my images. Scratch training ended up doing surprisingly well, with heavy data augmentation.

1 Like

Deep learning novice and physician here with a question on how far one can take transfer learning.

I get the traditional approach: Take a pre-trained “general” model like AlexNet that has been trained on ImageNet data and fine-tune the last few layers for a specific purpose (e.g. segmenting cats vs dogs, or cancer vs benign tumor). Could one take this a step further, i.e., take a cancer vs benign tumor model and fine-tune that for another specific purpose? Is it reasonable to expect better performance using the second approach, or have I had too much tequila?


How well transfer learning works has more to do with the types of features in the image than the actual subject. I’m general, transfer learning doesn’t work for radiology because the images have so little in common with photographs. I didn’t have success with transfer learning at all. Now that I have something that works though, I expect it can be fine tuned for similar purposes.

1 Like

I should clarify that I mean that transfer learning from ImageNet doesn’t generally work well for radiology, not the concept itself, which should work under the right circumstances…

Transfer learning is a very broad term. It includes different modes of transfer : fine-tuning only the end-layers, retraining the entire network with pre-initialized weights, freezing/retraining parts of the network at different learning rate, copying some low-level weights to a completely new network architecture, etc.

Consequently, evaluating the effectiveness of transfer learning against a random weight initialization depends on many parameters. Most important is probably the amount of data for the complexity of the problem you want to solve. For example, if you have only 100 images to solve a fracture vs no fracture classification problem on high resolution images, transfer learning will likely improve the (bad) results compared to training a non initialized deep network. But if you have 400000 images, then there is a chance that a random weight initialization can perform better than an imagenet weights initialization on a relatively deep CNN network.

But my personal experience in machine learning competitions applied to medical imaging is that initializing the weights with imagenet frequently helps to improve the validation/test results. It usually improves generalizability even if you retrain the entire network after weight initialization. It also makes sense since some lower levels filters usually detects edges, corners, and some composition of these. These can usually be reuse for almost any computer vision problem. The low level filters related to the detection of colors patterns are presumably less transferable to medical imaging.


Our dataset converged on a high level of accuracy when training from default initialization in keras on as few as ~5000 labeled medical images (with a lot of augmentation). I tried a few different methods of transfer learning similar to what you described, and didn’t see any improvement beyond what was achieved with default initialization. Of course, the only way to know for sure for a given application is to try.



Could you please share the data-augmentation strategies that you used on these medical images?