(I also just noticed you replaced the default init with normal - I don’t think you should do that, since glorot initialization is a better idea, I believe)
I used the model which you suggested.
model = Sequential([
Dense(512, input_dim=117, activation=‘relu’),
Dense(512,activation=‘relu’),
Dropout(0.2),
Dense(10, activation=‘softmax’)
])
model.fit(train, train_label, batch_size=64, nb_epoch=25, validation_data=(test, test_label))
Looks like you’re not overfitting any more. So you’ll have to think about your feature engineering, since it sounds like you have structured data. In general, deep learning isn’t the best tool for structured data - or at least I haven’t seen many people make it work well.
Thanks Jeremy it echoes my view as well “deep learning isn’t the best tool for structured data”.
I wanted to convince myself on that. Yes this is structured data.
I experimented normal machine learning tasks like random forest and getting f1 score (Train)= 0.8602 and fi (Test) = 8138 for the same dataset.
Can I conclude that more research has to be done for structured data and usually deep learning might not be the best tool ?
Yes I think that’s a reasonable conclusion - although I also think there’s no reason DL couldn’t turn out to be just as good as random forests if more people work on DL for structured data. It’s something I’d be interested in spending time on sometime, since I’ve been a major RF fan for a long time!
Are you one-hot encoding all the categorical variables? If not - you definitely need to. If some are very high cardinality (i.e. have many levels) use an Embedding layer for them instead.
It’s purely a computational/memory saving. Rather than multiplying by a one-hot encoded matrix, which if high cardinality would be huge, it’s quicker and less memory intensive to simply use an integer to index into it directly. The result is, of course, identical.