TabularDataLoaders keeps incorrectly typing my target column as "continuous" instead of "binary", leading to a "ValueError: Can't Handle mix of binary and continuous target"

danhrg · February 12, 2023, 8:59pm

I’m trying to run an f1_score classification problem with tabular data (specifically, a pandas DataFrame). My target column, i.e. the quantity I am trying to model, is everywhere 1 or 0 (and I’ve also tried coercing the type of that column to bool, or int, or str, instead of float, but it doesn’t help). Even though my metric is f1_score (again, I’m doing a classification problem – i.e. NOT a regression), the learner or data loader is wrongly assuming the target column is either continuous, or else, if I convert the type of that column into a bool or int, is continuous-multioutput, and therefore I get an error like so when I get to the learning stage:

 ValueError: Exception occured in `Recorder` when calling event `after_batch`:**
 	Classification metrics can't handle a mix of continuous-multioutput and binary targets**

When stepping into the debugger, I can determine that the forecast values being generated by the model have a type of “binary”, which is as it should be, but the target values that they are being compared with are determined to be either continuous or continuous-multioutput, even though they are also binary – i.e. either binary ints equal to one or zero, or else bools, or else strings of the form ‘0’ or ‘1’. Again, I’ve tried all those approaches, and get more or less the same error.

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)

    1346         raise ValueError("average has to be one of " + str(average_options))
    1347 
 -> 1348     y_type, y_true, y_pred = _check_targets(y_true, y_pred)

Again, y_true is wrongly listed as “continuous” (while y_pred is correctly listed as ‘binary’).

Here is the relevant code just before the error:

 dls =  TabularDataLoaders.from_df (x, path='.',
                              cat_names=cat_names,
                              y_block= CategoryBlock(),
                              cont_names=cont_names, y_names=y_names, y_range=(0,1),
                              valid_idx=valid_idx,
                              procs = procs)

learn = tabular_learner(dls,metrics=f1_score)

learn.fit_one_cycle(3) # this is where the error happens

For what it’s worth, cat_names is empty and the inputs are all continuous (but I was getting the same error in an earlier version when all the inputs were themselves binary), and I’ve tried getting rid of the. y_block and y_range arguments, because that was suggested in a similar situation, and that didn’t help either. (I’ve also tried closely following along with the steps in Tabular Binary Classification (Beginner) | walkwithfastai but that too ultimately leads to the same error.)

I realize the above error is typically encountered when someone wrongly uses a classification metric (like f1_score) on a regression model, but as far as I can tell, that should not be happening here. Any ideas on what I’m doing wrong? Note that if I change the metric from ‘f1_score’ to ‘accuracy’, it works fine, though if I instead go with metrics.accuracy_score or metrics.precision_score I get the same errors, which makes it more likely that this might actually be an issue with fast.ai.)

danhrg · February 13, 2023, 7:21pm

Was finally able to get something that didn’t crash by getting rid of the y_block argument, adding a “bs=64” parameter (which I doubt makes a lot of difference), so that it now looks like the following:

x["correct"] = x["correct"].astype(int)
dls = TabularDataLoaders.from_df(x, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                 y_names="correct", valid_idx=valid_idx, bs=64, metrics=f1_score)

I have no idea why that works whereas the earlier version doesn’t, but regardless, perhaps this may help someone else.