Why is my random tree outperforming my neural net [Tabular]?

I’m working on a tabular dataset, and have tried to apply the relatively standard fastai methods:
Neural net:
splits = RandomSplitter()(range_of(df_short))

cont,cat = cont_cat_split(df, 1, dep_var=dep_var)
dl = TabularDataLoaders.from_df(df_short, y_names=dep_var,
    cat_names = cat,
    cont_names = cont,
    procs = [Categorify, FillMissing, Normalize], splits=splits

learn = tabular_learner(dl, metrics=mse
                        , loss_func=MSELossFlat(), valid_idx=df_valid
learn.fit_one_cycle(4, lr= 0.009120108559727669)

The results are not great:

epoch train_loss valid_loss mse time
0 16.669508 12.575871 12.575871 00:27
1 11.834690 11.773903 11.773903 00:29
2 9.304846 11.929216 11.929216 00:29
3 6.391390 12.667510 12.667510 00:28

However! When I try a random forest, the results are much better:

splits = RandomSplitter()(range_of(df_short))

procs = [Categorify, FillMissing, Normalize]
cont,cat = cont_cat_split(df, 1, dep_var=dep_var)
to = TabularPandas(df_short, procs, cat, cont, y_names=dep_var, splits=splits)
xs,y = to.train.xs,to.train.y
valid_xs,valid_y = to.valid.xs,to.valid.y

def r_mse(pred,y): return round(math.sqrt(((pred-y)**2).mean()), 6)
def m_rmse(m, xs, y): return r_mse(m.predict(xs), y)
def rf(xs, y, n_estimators=40, max_samples=31664,
       max_features=0.5, min_samples_leaf=5, **kwargs):
    return RandomForestRegressor(n_jobs=-1, n_estimators=n_estimators,
        max_samples=max_samples, max_features=max_features,
        min_samples_leaf=min_samples_leaf, oob_score=True).fit(xs, y)
m = rf(xs, y);
m_rmse(m, xs, y), m_rmse(m, valid_xs, valid_y)

Resulting in:
(2.070548, 3.202993)

From what I’ve read and learned the first method should at least be competitive with the second. The complete notebook is here. I cannot share the dataset so the output is pretty bare, I’m just wondering if there are any obvious mistakes I’m making.

How many categorical variables are there? I’ve seen with less variables (categorical) the model may not be as good, also did you do any feature pruning? Generally NN’s require some form of hyper parameter tuning. Mostly being the features used (selected via feature importance), how long you train for (sometimes many epochs), or attributes such as weight decay, drop out on your embeddings.

Thanks for the quick reply.
len(cont), len(cat)
(19, 9)
As you can see in the link, I did no future pruning (I plan to do that once I understand why I’m getting the current results). I’ve tried a little hyper parameter tuning (tried to exclude some features, only included one or two features) but haven’t really trained for more than 6 epochs because the valid loss/ mse was relatively stable around 12. I will try the weight decay and drop out.

I was just surprised that out of the box the difference was so large between the models, but I take from your reply that there is nothing obvious that I did wrong and that I will just have to spend more time trying to figure out how to improve my model.