I am looking for suggestion for a problem that I am working on:

Problem: predict a rank of a product based on its features. The ranks are given as a percentile.

Data: 4500 rows with around 900 features for each observation.

model: I am using MLP (a 2 layer NN) in Keras Model has two 3 layers.

First layer dense(16), second: dense(4) and third dense(1). I am also using batchnormalization and dropout(0.5) after each dense layer. I am using mse as loss function and SGD as optimizer. I tried 100 iteration so far and getting rmse on test data as ~24.50.

Issue: Model is overfitting as seen in learning curve train vs test loss and also train rmse ~5-7 but test is ~24. I tried increasing probability of dropout to 0.8 it negatively impacted. more data is absolutely not an option. How can I reduce model complexity I don’t know because it is only 3 layers only, I can try using two layers too but suspecting underfitting (no harm trying?)

Is there any other suggestions? please let me know.

Note: data is scaled before training using standardscaler.

Code:

def baseline_model_899():

# create model

sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)

model = Sequential()

model.add(Dense(16, activation=‘relu’, kernel_initializer = ‘normal’, input_shape=(899,)))

model.add(BatchNormalization())

model.add(Dropout(0.5))

model.add(Dense(1, init=‘normal’, activation=‘linear’))

model.compile(loss=‘mean_squared_error’, optimizer=sgd, metrics=[‘accuracy’])

return model

```
np.random.seed(42)
estimator = KerasRegressor(build_fn=baseline_model_899, epochs=200, batch_size=5, verbose=0)
print(estimator)
kfold = KFold(n_splits=10, random_state=42)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)
print("RMSE:", np.sqrt(results.std()))
```