Huge train_loss and valid loss. how to fix?

offirinbar · April 22, 2019, 1:19pm

Hello.

I really need the community help here! i am stuck for more than a week!

I’m working on my tabular data set.

My is in some way similar to Rossman (except Sales I am trying to predict air pollution).

My training data contains 12,711 samples. (my validation sets has 1700 samples).

when I try to train my model I get huge train_loss and valid loss:

and it doesn’t get much better.

after a few training sessions I predicted my parameter and I got predictions ~ 1,000,000 expect ~ 30-50. (in my units).

if you need some more details I can give.

Thank you very much!
Offir Inbar

rgarcia · April 8, 2020, 9:44am

There is not enough info to know what you are doing well or not.

I had a similar case and the situation improved drastically when I took out all the columns that did not provide value. (what is not in cat_names or cont_names is not used)

A very quick analysis of them was to plot their values. Look for “feature importance” to do a real analysis.

# Visualize all vars to decide what to remove
for c in df.columns:
  df[c].astype(int).plot(figsize=(15,5));
  print(f"/n----- {c} -----")
  plt.show()

If you throw garbage in, you get garbage out.