Hello people.

I ran a tabular model on a big dataset of around 10 million rows (6 million train and rest valid) , used a batch size of 10k to reduce train time. (seemed reasonable)

- highly unbalanced binary class data, (10% pos)
- has just 6 categorical features with around 10 cats each, no continous features.

due to 1, I was careful to make the ratio 50:50 while training. I got an accuracy if 84% but got zero with precision,recall and F1 with error

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use

`zero_division`

parameter to control this behavior.

I then used test_dl and scored my model prediction using sklearns functions directly and got these numbers

```
precision 0.23889707467515692
recall 0.19972771038394624
f1 0.2175634727794161
accuracy 0.7811655568491515
```

These numbers are much lower than I got with LGBM.

So questions are

- why did I get the precision error? Why did the accuracy change when I reran it with sklearn?
- What am I doing wrong? How do I make my model perform better?

TIA!