Output size is the number of neurons in your final layer. This is determined by the nature of your problem. For a classification problem with just two classes, you could have just one final neuron or do what I did. However, to work with just one output size you must change two things: the loss function of the learner and the final activation function.
In your case, I’m guessing your problem is you are not using the right activation function. You need to use a sigmoid function to constrain the output of your network .
Hi! Quite frankly, I lost interest in this competition and got dragged into other hobby stuff non ML-related. However, if you or anyone seems to catch up with trees using NN’s, I’d definitely go back to it
I’ve gotten it to .768 with a NN model, vs .796 with my best LightGBM submission. Simple stacking with these two models ended up at .788.
Feeding the Keras scores into my LightGBM model ended up with a decent local AUC (.795), but the LB AUC was a terrible .676. It isn’t immediately clear to me why this made it perform so poorly.
Going to try some better na-filling methods and possibly try oversampling to see if I can improve the standalone model at all. Oversampling hasn’t been very useful with gradient boosting models, but it may still be worth a shot.
I tried to replicate the kaggle kernel by @davidsalazarvergara on my paperspace machine and everything works including learn.predict() except learn.predict(is_test=True) with all test data which results in a “cuda runtime error (59)”.
Conversely, if I set test_df in the model data setup to df_train.iloc[:10] it works.
The train and the test data frame have the same shapes (y removed from train) and the same dtypes per column. “nvidia-smi” doesn’t show a full GPU memory.
Any suggestions on how to solve this strange behavior?
Thank you very much, so far I learned a lot through this forum!