I am currently working on fine-tuning a CNN model which uses the
'swin_base_patch4_window7_224' architecture but using 2 images as input for the model.
I have been able to fine tune the model using only one image which is supported by the default architecture.
I created the Dataloaders for 2 input images as following:
# Functions to extract images paths from each Dataframe row def get_x1(r): return r['jpg_paths'] def get_x2(r): return r['rgn_paths'] # Define the datablock for 2 image inputs dblock = DataBlock( blocks=(ImageBlock, ImageBlock, CategoryBlock), get_x=[get_x1,get_x2], get_y=get_y, splitter=RandomSplitter(valid_pct=0.15, seed=42), n_inp = 2, item_tfms=[RandomResizedCrop(224, min_scale=0.35)] ) # Define the dataloaders dls = dblock.dataloaders(Train_df, bs=4, device=torch.device('cuda'))
How can I then define the vision_learner using the swin architecture?
Any help would be much appreciated, and thank you!