Medical Imaging | DataBlock for DICOM metadata

Hi All

I’m working on Kaggle pulmonary embolism competition. It has 900GB+ of training data (DICOM images) and 230 GB of test data.
I’m trying to create a multi-input NN architecture to ingest both, image and metadata.

For just images only, I used the following code:

get_x = lambda x:f'{source}/train/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'

vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate', 
         'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
         'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
         'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
         'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences

tfms = [IntToFloatTensor(div=1000.0, div_mask=1), 

get_y = ColReader(vocab) 
block = DataBlock(blocks=(ImageBlock(cls=PILDicom), MultiCategoryBlock(vocab=vocab, encoded=True)), 

For image + metadata, however, I’m at loss how to create the DataBlock. I DON’T want to extract all metadata in feather file and use it as a table. I want to do this extraction of metadata on fly (through dataloader).

Any help would be very helpful. Please let me know what details would you need.
Thanks in advance!

Check out this new medium post using with DICOM