Underfitting the model or another problem

I have the coco 2014 dataset and need to train it as training is around 82700 and testing is 40500. However, I got the same sentence with different values every time with model.pedict() as I used one epoch only. I tried to increase epochs

 def define_model(vocab_size, max_length, curr_shape):
        inputs1 = Input(shape=curr_shape)
        fe1 = Dropout(0.5)(inputs1)
        fe2 = Dense(256, activation='relu')(fe1)
        model = tf.keras.models.Sequential()
        inputs2 = Input(shape=(max_length,))
        se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
        se2 = Dropout(0.5)(se1)
        se3 = LSTM(256)(se2)
        decoder1 =Concatenate()([fe2, se3])
        decoder2 = Dense(256, activation='relu')(decoder1)
        outputs = Dense(vocab_size, activation='softmax')(decoder2)
        model = Model(inputs=[inputs1, inputs2], outputs=outputs)
        model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
        model.summary()
        return model

the model as follows

    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    input_2 (InputLayer)            [(None, 49)]         0                                            
    __________________________________________________________________________________________________
    input_1 (InputLayer)            [(None, 1120)]       0                                            
    __________________________________________________________________________________________________
    embedding (Embedding)           (None, 49, 256)      6235648     input_2[0][0]                    
    __________________________________________________________________________________________________
    dropout (Dropout)               (None, 1120)         0           input_1[0][0]                    
    __________________________________________________________________________________________________
    dropout_1 (Dropout)             (None, 49, 256)      0           embedding[0][0]                  
    __________________________________________________________________________________________________
    dense (Dense)                   (None, 256)          286976      dropout[0][0]                    
    __________________________________________________________________________________________________
    lstm (LSTM)                     (None, 256)          525312      dropout_1[0][0]                  
    __________________________________________________________________________________________________
    concatenate (Concatenate)       (None, 512)          0           dense[0][0]                      
                                                                     lstm[0][0]                       
    __________________________________________________________________________________________________
    dense_1 (Dense)                 (None, 256)          131328      concatenate[0][0]                
    __________________________________________________________________________________________________
    dense_2 (Dense)                 (None, 24358)        6260006     dense_1[0][0]  

Total params: 13,439,270

I set

   history = model.fit(train_generator, epochs=100, steps_per_epoch=train_steps, verbose=1, callbacks=[checkpoint], validation_data=val_generator, validation_steps=val_steps,batch_size=64)

and got

Epoch 00001 loss: 4.6360 - accuracy: 0.2506 - val_loss: 4.1580 - val_accuracy: 0.2970
Epoch 00002 loss: 4.0904 - accuracy: 0.3026 - val_loss: 3.9843 - val_accuracy: 0.3134
Epoch 00003 loss: 3.9805 - accuracy: 0.3123 - val_loss: 3.9290 - val_accuracy: 0.3192
Epoch 00004 loss: 3.9422 - accuracy: 0.3169 - val_loss: 3.9061 - val_accuracy: 0.3223
Epoch 00005 loss: 3.9311 - accuracy: 0.3188 - val_loss: 3.8962 - val_accuracy: 0.3242
Epoch 00006 loss: 3.9335 - accuracy: 0.3196 - val_loss: 3.9165 - val_accuracy: 0.3229  
Epoch 00006: val_loss did not improve from 3.89620
Epoch 00007 loss: 3.9437 - accuracy: 0.3196 - val_loss: 3.9297 - val_accuracy: 0.3241
Epoch 00007: val_loss did not improve from 3.89620

is there any wrong in the model or what exactly i can change for improve the results ?
Appreciating any help