Model optimization in medical diagnostics

I am a complete newbie here so I do hope I am not asking this in a wrong place (I am not an avid forum user). Now that the formalities are out of the way - i need some guidelines. It should also be noted I am new with deep learning so do bare with me if there is any incorrect terminology.

I am creating a neural network with about 400 classes for the model to choose from as the right one (only one is class is the right one).
There are also about 400 attributes. The first model I had was rather simple compiled from different tutorials and guidelines I read while researching the topic:

model = Sequential()
model.add(Dense(750, input_dim=337,  activation='relu'))
model.add(Dense(530, activation='relu'))
model.add(Dense(400, activation='softmax'))

The current best performance wise is:

model = Sequential()
model.add(Dense(430, input_dim=337,  activation='softsign'))
model.add(Dense(415, activation='softplus'))
model.add(Dense(400, activation='softmax'))

Why I came here to call upon you, dear reader, is to get some general help with how to improve the model - or even better understand what each layer adds to the whole (I saw someone ask a similar question on this very forum and the answer someone gave was so well written I just had to come and post my own query).

The current precision with 250 epochs is about 35%. I intend to train it with up to 1200 epochs, I just didn’t have time to do so yet.
I suppose the following is also relevant:

        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])[train], Y_train, epochs=250, batch_size=32, verbose=0)

Thank you for your time.
Yours truly, Nac.

@EneJsi, while there can be many reasons for poor model prediction performance I noticed this:
You are using Dense layers only. Thus, your models resemble multilayer perceptrons, among the first neural networks. It may be possible that such an architecture works, but if it doesn’t consider, that there is a fairly high dimensionality already. How many examples are you looking at? You might want to have at least 10x as many observations as classes, better 100x.
There are architectures which do some inherent dimensionality redcution while learning useful features at the same time: convolutional neural networks. By adding convolutional layers you have the optoin to use a stride>2 or MaxPooling to downsample your data successively. You might want to have more than 4 layers, too. Modern high-performance architectures have dozens or hundreds of layers. Again, it depends on how much data (number of observations) you have. A large (deep or wide) architecture can overfit more easily than a small one. So building up bottom up like you do is surely a good idea. Try to get overfitting without dropout at first. Once it overfits you know the model is large enough to capture the variability in the data. Then you can add dropout, augmentation or more data.
For all of the the above said I highly recommend lessons 1, 2, 3, 4 and 7 of part 1 of the Deep Learning course here on
Hope it helps.

1 Like

First of all - thank you for the feedback.

I am going to try and use convolutional layers, yes. I suppose I have to make some changes to my data tho? An example in the training set currently has the dimensions of (337,). I am working with about 200.000 observations. Also, there are many attributes that are not defined (0).

I’ll gladly check out the lessons you pointed out.
Thank you again for all the help and explaining. Have a nice day!

Have fun and let us know how it goes.