Tabular Data Loader Question - Cleaned Data

learningML89 · October 20, 2022, 3:21am

I have a question for using the tabular data loader. I have a dataset I’ve cleaned and a number of features have been one hot encoded. Aside from those features, everything else is a float/continuous.

How would I treat the dummy features that are 0 or 1? Prior to one hot encoding I assume they would have been in “cat_names”

Also, does it make more sense to let FastAi handle the normalization, or can it be passed mostly ready to go? For example, this dataset was very imbalanced so I used SMOTE to oversample and a scaler to scale the features.

Using my scoring metric (AUC) I get roughly .82 on logistic regression, random forest, and XG Boost models but want to see if the fastai tabular learner can do better.