MultiClass Classification using Dense Layers

Looks like you’re not overfitting any more. So you’ll have to think about your feature engineering, since it sounds like you have structured data. In general, deep learning isn’t the best tool for structured data - or at least I haven’t seen many people make it work well.

1 Like

Thanks Jeremy it echoes my view as well “deep learning isn’t the best tool for structured data”.
I wanted to convince myself on that. Yes this is structured data.
I experimented normal machine learning tasks like random forest and getting f1 score (Train)= 0.8602 and fi (Test) = 8138 for the same dataset.
Can I conclude that more research has to be done for structured data and usually deep learning might not be the best tool ?

1 Like

Yes I think that’s a reasonable conclusion - although I also think there’s no reason DL couldn’t turn out to be just as good as random forests if more people work on DL for structured data. It’s something I’d be interested in spending time on sometime, since I’ve been a major RF fan for a long time!

3 Likes

BTW in your most recent snippet you didn’t show the compile step. What learning rate did you use? Have you tried decreasing it a lot?

I used
model.compile(loss=‘categorical_crossentropy’, optimizer=Adam(lr=1e-4), metrics=[‘accuracy’])

I also tried decreasing it optimizer=Adam(lr=1e-06) not much improvement
Epoch 24/25
45377/45377 [==============================] - 12s - loss: 1.8287 - acc: 0.3665 - val_loss: 1.7992 - val_acc: 0.3839
Epoch 25/25
45377/45377 [==============================] - 12s - loss: 1.8294 - acc: 0.3665 - val_loss: 1.7990 - val_acc: 0.3839

Thanks for reporting back!

Are you one-hot encoding all the categorical variables? If not - you definitely need to. If some are very high cardinality (i.e. have many levels) use an Embedding layer for them instead.

1 Like

Yes I am using OneHotEncoding for the label and also for categorical features

Just wondering what is the theory behind embedding layer for high cardinality ?

It’s purely a computational/memory saving. Rather than multiplying by a one-hot encoded matrix, which if high cardinality would be huge, it’s quicker and less memory intensive to simply use an integer to index into it directly. The result is, of course, identical.

1 Like

That is a very good tip for high cardinality variables.

It is! And not just for deep learning - for regression, GlmNet, etc I’ve always used this approach instead of creating dummy variables. :slight_smile:

Another technique which might be useful for categorical is hashing since it has the property to bin similar items to the same index.

1 Like

Could you explain a little bit more about using embedding layer instead of using one-hot encoding.

I have always used one-hot encoding and of course it is very slow and memory-consuming.

Have you watched the relevant lessons? Have a look at lessons 4-6 and look at the notebooks and spreadsheets, and then let us know any specific questions you have based on that material.

1 Like

@jeremy AND janardhanp22,
You have had very interesting conversation.

"I experimented normal machine learning tasks like random forest and getting f1 score (Train)= 0.8602 and fi (Test) = 8138 for the same dataset."
well, with Fandom Forest, I get F1=0.967 for my data (another dataset similar to tursun_deep_p6.csv).I used RF in Orange.
For my data @janardhanp22 said “since it sounds like you have structured data. In general, deep learning isn’t the best tool for structured data”

Did I find contadict here:
place one:
here, @Jeremy said “do you have structured data. In general, deep learning isn’t the best tool for structured data”

place two:
@sibnick in this webpage:


said this to my dataset:
“CNN, RNN can be effectively used when parameters have structure (e.g. time series, geometric distribution like pixels in image)”

my data “tursun_deep_p6.csv” in this webpage:

I believe Jeremy means structured data like user profile or something like this. CNN, RNN work with correlated data - e.g. pixel’s are correlated with neighbor pixels on real images.

1 Like

Switch Softmax to Sigmoid

@jeremy (or anyone else familiar with the topic) – why does DL seem to struggle with structured data compared to other methods (like rf’s)? is there any interesting published research on the topic? I work on the analytics team for an ecommerce company, so I can think of many interesting problems that use structured data. I’d love to find a way to get strong performance using neural networks to minimize the “guesswork” around feature engineering.

From my point of view it is nature of neural networks. They are universal approximators for continuous functions. But if you have no continuous function for initial model or for loss function then NN are not strong.