There are about 15000 images in the train set so it does take a while! You could break it down into chunks and then merge the dataframes and also by default from_dicoms uses the brain window so you may want to change that as well.
thanks, ok, I guess I’ll give that a shot… surprised that that’s a lot, as i’m just trying to pull text from images into a dataframe… I just came back to the computer and 3 hours didn’t cut it…
Also, I didn’t know about the different windows it wasn’t n the tutorial… I should probably go read the docs.
Anyway, just started running the notebook on just 5k, will report back in a couple hours.
To access that meta information that function needs to read the DICOM file, which usually is a large file with all the metadata including pixel array. So it’s not surprising if it takes some time.
Yeah, I’ve never worked with DICOM files before… I guess I’m just surprised that there’s not a fast way to extract the metadata from the image. Seems like a pretty inefficient standard…
Anyway, it failed on 5k images before the kaggle kernel timed out. I’m running it now on just 500 images.
Ok, 500 took a bit over 11 minutes… I’ll go through Jeremy’s older notebook tomorrow and see if I can figure out what the difference is… that competition had 74k images and he was able to load up the metadata in under 15 minutes.
By default from_dicoms generates a summary (img_min, img_max, img_mean, img_std, img_pct_window) this uses the pixel_array and this is the time consuming part.
If you do not really want this info then you can manually turn this off and and it is alot faster. (7 mins)
I’ve done my pre-processing on the dicom images and then saved the results as 16bit .tiff files.
but the problem that i’m having now is that loading the images as a DataBlock seems to automatically convert them to 8bit.
Anyone have any ideas on how to load the data as 16bit?
I could work of the dicom files directly, but the dataset is just too large to use on my machine so i spent quite some time already on getting this 16bit tiff data set