Easiest solution is to repeat your grayscale data x3 along the channel axis. You could weight that differently for the different channels or if you’re adventurous try to stick a 3 filter convolutional layer that you train between the grayscale data and the pretrained model (make sure the pretrained weights are frozen before training).
Alternatively you can take the sum of the weights of the first convolutional layer kernel along the input channel axis and create a new model that expects one channel input and load those weights into the first layer. This will reduce the number of operations you perform.