Customizing pretrained models

Hello Everyone,
I am currently working on fine-tuning a CNN model which uses the 'swin_base_patch4_window7_224' architecture but using 2 images as input for the model.
I have been able to fine tune the model using only one image which is supported by the default architecture.
I created the Dataloaders for 2 input images as following:

# Functions to extract images paths from each Dataframe row
def get_x1(r):
    return r['jpg_paths']

def get_x2(r):
    return r['rgn_paths']

# Define the datablock for 2 image inputs
dblock = DataBlock(
    blocks=(ImageBlock, ImageBlock, CategoryBlock),
    get_x=[get_x1,get_x2],
    get_y=get_y,
    splitter=RandomSplitter(valid_pct=0.15, seed=42),
    n_inp = 2,
    item_tfms=[RandomResizedCrop(224, min_scale=0.35)]
)

# Define the dataloaders
dls = dblock.dataloaders(Train_df, bs=4, device=torch.device('cuda'))

How can I then define the vision_learner using the swin architecture?
Any help would be much appreciated, and thank you!