Medical Imaging | DataBlock for DICOM metadata

imnishantg · October 11, 2020, 7:12pm

Hi All

I’m working on Kaggle pulmonary embolism competition. It has 900GB+ of training data (DICOM images) and 230 GB of test data.
I’m trying to create a multi-input NN architecture to ingest both, image and metadata.

For just images only, I used the following code:

get_x = lambda x:f'{source}/train/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

tfms = [IntToFloatTensor(div=1000.0, div_mask=1), 
        *aug_transforms(size=img_inp[eff_type]))]

get_y = ColReader(vocab) 
block = DataBlock(blocks=(ImageBlock(cls=PILDicom), MultiCategoryBlock(vocab=vocab, encoded=True)), 
              get_x=get_x,
              get_y=get_y,
              batch_tfms=tfms)

For image + metadata, however, I’m at loss how to create the DataBlock. I DON’T want to extract all metadata in feather file and use it as a table. I want to do this extraction of metadata on fly (through dataloader).

Any help would be very helpful. Please let me know what details would you need.
Thanks in advance!

astein · October 12, 2020, 5:55pm

Check out this new medium post using fast.ai with DICOM https://medium.com/@vazirabad.maryam/fast-ai-detecting-axial-images-in-ct-exams-using-deep-learning-6d35f3240de5