That is basically it.
For an imbalanced dataset, can we penalize incorrect predictions where the correct class is the minority class … in roughly the same proportion as there is between the minority and majority class in the dataset?
For example, if the minority class = 1 and 95% of the dataset is the majority class (class = 0), is there a way to say, “If the actual is 1 and you predicted 0, then increase the penalty by penalty * X”?
Easiest approach is to repeat the entries so the # of samples per classes become balanced.
I assume this is just for the training set, correct?
Correct. Otherwise duplicate data can leak into your validation and test set which is not what we want.
BTW, any thoughts on the pros/cons of oversampling vs undersampling?
Not really. Experiment with both. May help to add some perturbations when oversampling if it makes sense to the dataset.