Medical imaging from_dicoms not finishing

beezus666 · February 13, 2021, 1:06pm

Hi, I’m going through the medical imaging tutorial by going through a new-ish Kaggle competition VinBigData chest x-rays.

In my notebook, I’m getting stuck on this:
dicom_dataframe = pd.DataFrame.from_dicoms(items)
dicom_dataframe[:5]

Where it will start to run ok, but then it never finishes putting the DICOM information into a dataframe.

In addition to the tutorial, Jeremy posted a similar Kaggle notebook about a year ago, and I can’t figure out why his works and mine doesn’t…

Thanks in advance for any help!

amritv · February 13, 2021, 10:44pm

There are about 15000 images in the train set so it does take a while! You could break it down into chunks and then merge the dataframes and also by default from_dicoms uses the brain window so you may want to change that as well.

dicom_dataframe = pd.DataFrame.from_dicoms(items[:5000], window=dicom_windows.lungs)

beezus666 · February 14, 2021, 1:19am

thanks, ok, I guess I’ll give that a shot… surprised that that’s a lot, as i’m just trying to pull text from images into a dataframe… I just came back to the computer and 3 hours didn’t cut it…

Also, I didn’t know about the different windows it wasn’t n the tutorial… I should probably go read the docs.

Anyway, just started running the notebook on just 5k, will report back in a couple hours.

thanks!

kcturgutlu · February 14, 2021, 3:07am

To access that meta information that function needs to read the DICOM file, which usually is a large file with all the metadata including pixel array. So it’s not surprising if it takes some time.

beezus666 · February 14, 2021, 4:53am

Yeah, I’ve never worked with DICOM files before… I guess I’m just surprised that there’s not a fast way to extract the metadata from the image. Seems like a pretty inefficient standard…

Anyway, it failed on 5k images before the kaggle kernel timed out. I’m running it now on just 500 images.

beezus666 · February 14, 2021, 5:18am

Ok, 500 took a bit over 11 minutes… I’ll go through Jeremy’s older notebook tomorrow and see if I can figure out what the difference is… that competition had 74k images and he was able to load up the metadata in under 15 minutes.

amritv · February 14, 2021, 5:39am

By default from_dicoms generates a summary (img_min, img_max, img_mean, img_std, img_pct_window) this uses the pixel_array and this is the time consuming part.

If you do not really want this info then you can manually turn this off and and it is alot faster. (7 mins)

I am submitting a PR so that it is easy to toggle this feature on or off as required.

beezus666 · February 14, 2021, 3:30pm

Thanks! that worked!

Is there anywhere to find the list of parameters for from_dicoms? I can’t seem to find any documentation on it.

amritv · February 14, 2021, 4:16pm

Have a look at this notebook

github.com

fastai/fastai/blob/master/nbs/60_medical.imaging.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "#skip\n",
    "! [ -e /content ] && pip install -Uqq fastai  # upgrade fastai on colab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# all_slow"

This file has been truncated. show original

or here

boJa · February 14, 2021, 4:26pm

as an aside you may be interested in working with downsampled version of the data (such as this (jpeg images)) for fast experimenting.

tomasvdb · February 25, 2021, 10:25am

I’ve done my pre-processing on the dicom images and then saved the results as 16bit .tiff files.
but the problem that i’m having now is that loading the images as a DataBlock seems to automatically convert them to 8bit.
Anyone have any ideas on how to load the data as 16bit?
I could work of the dicom files directly, but the dataset is just too large to use on my machine so i spent quite some time already on getting this 16bit tiff data set

amritv · March 4, 2021, 5:58am

You could look at this which allows for easy integration of a custom PILDicom block.

fmi