Strategies for Reducing GPU Memory Required

I’m trying to implement some code that will use vgg16bn with some customization that I am doing. I am having troubles getting it to a size that I can run it through my 2 GB gpu in my laptop and I just tried it on my desktop and even that is having troubles with it with the 11 GB gpu (1080 Ti). So I am wondering if there are any strategies that I am missing the mark on to use less ram as I’m trying to fit my model. My current attempt:

trn_data = get_batches("label_train_img", target_size=(256,256))
trn_data_new = get_data("label_train_img", target_size=(256,256)) #using this to get data directly instead of batches, maybe I should try to figure out the batches again?

ohe = OneHotEncoder(sparse=False)
trn_output=ohe.fit_transform(trn_data.classes.reshape(-1,1)) #getting my output in one hot encoded form

vgg = Vgg16BN(size=(256,256),include_top=False)
#vgg.model.pop()
for layer in vgg.model.layers: layer.trainable=False
vgg.model.add(Flatten())
vgg.model.add(Dense(4096, activation='relu'))
vgg.model.add(BatchNormalization())
vgg.model.add(Dense(4096, activation='relu'))
vgg.model.add(BatchNormalization())
vgg.model.add(Dense(25, activation='softmax')) #output could be a possible 25 different outputs
vgg.model.compile(optimizer = "rmsprop", loss = "categorical_crossentropy") #I'll probably end up using adam here
#vgg.model.global_variables_initializer()

vgg.model.fit(trn_data_new,trn_output,epochs=1,batch_size=16)