Hi All
I’m participating in kaggle pulmonary embolism competition and facing challenge in writing the test pipeline. Kaggle requires a separate inference notebook for submission that does not have train data or internet connection… in case this is useful.
The idea is to train a model – save the model – load the model in different file – do the inference.
Following is my training pipeline:
df is the column of filenames and target variables (encoded)
get_x = lambda x:f'{source}/train/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'
vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate',
'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences
get_y = ColReader(vocab)
block = DataBlock(blocks=(ImageBlock(cls=PILDicom), MultiCategoryBlock(vocab=vocab, encoded=True)),
get_x=get_x,
get_y=get_y,
batch_tfms=aug_transforms(size=224))
dls = block.dataloaders(df, bs=8, num_workers=0)
head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)
learn = cnn_learner(dls, resnet34, config=config)
learn.fit_one_cycle(3, lr_max=0.05)
learn.save(file='resnet34_10epochs')
Now, the inference pipeline is: In this case, df is dataframe with filenames only. no target variable columns. Also, loss_func in training learn was self-configured as BCEWithLogitsLoss()… so used the same here as well.
get_x = lambda x:f'{source}/test/{x.StudyInstanceUID}/{x.SeriesInstanceUID}/{x.SOPInstanceUID}.dcm'
vocab = ['pe_present_on_image', 'negative_exam_for_pe', 'indeterminate',
'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', # Only one label should be true at a time
'chronic_pe', 'acute_and_chronic_pe', # Only one label can be true at a time
'leftsided_pe', 'central_pe', 'rightsided_pe', # More than one label can be true at a time
'qa_motion', 'qa_contrast', 'flow_artifact', 'true_filling_defect_not_pe'] # These are only informational. Maybe use it for study level inferences
block = DataBlock(blocks=(ImageBlock(cls=PILDicom)), # , MultiCategoryBlock(vocab=vocab, encoded=True)
get_x=get_x,
batch_tfms=aug_transforms(size=224))
dls = block.dataloaders(df[:1000], bs=64, num_workers=0)
head = create_head(nf=1024, n_out=14, lin_ftrs=[256, 64], concat_pool=True)
config = cnn_config(custom_head=head)
learn = cnn_learner(dls, resnet34, config=config, , n_out=14, pretrained=False, loss_func=nn.BCEWithLogitsLoss())
test_data = dls.test_dl(df)
preds = learn.get_preds(dl=test_data)
On the get_preds, I’m getting the following error:
RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 1, 224, 224] to have 3 channels, but got 1 channels instead
I can interpret the issue. It is related to having only 1 channel, instead of 3 channels. But this is exactly like datablock and dataloader in training process. How come this issue did not come up in training process but came up in inference process.
Need some help urgently!
Thanks in anticipation.