Different results when loading model on different GPUs

Hi all.

I trained my model through GTX 1080 Ti with the following code.

path='/home/ubuntu/depth1'
fname = get_image_files(path)
data = ImageDataLoaders.from_name_re(path,
                                     fname,
                                    valid_pct=0.2, 
                                    pat=r'.*T(.*)F.*.*',
                                    bs=16,
                                    seed=42,
                                    item_tfms=Resize(360,method=ResizeMethod.Squish))
learn = cnn_learner(data, resnet101, metrics=[error_rate,accuracy,F1Score(average='micro'),Precision(average='micro'),Recall(average='micro')],cbs=[WandbCallback(log_dataset=True, log_model=True),SaveModelCallback()])

learn.fine_tune(5, 0.001,freeze_epochs=2,cbs=[WandbCallback(),SaveModelCallback()])
learn.save('model1')

Then, I tried to load the model on Tesla K80.

learn = cnn_learner(data, resnet101, metrics=[error_rate,accuracy,F1Score(average='micro'),Precision(average='micro'),Recall(average='micro')],cbs=[WandbCallback(log_dataset=True, log_model=True),SaveModelCallback()])

learn.load('model1')

Surprisingly, from 99% on initial training, and now it’s just 50%. The reason being almost half of the data is being wrongly classified to one single class. It only happens if I load on other GPUs, but not on the same GTX1080 Ti. I am wondering if it’s something to do with the FP operations? Thank you.

I found out the mismatch of CUDA version causes this error. I have CUDA 11.2, but Torch only support up to 11.0. This might be the problem