Tabular model export to ONNX and use it in production

pivpiv · October 25, 2021, 8:43am

Hi,

I have created a tabular model for binary classification with only continuous variable without data preprocessing . I achieve 83% accuracy on validation set during training. Then I export the model to ONNX format and load it back with OpenCV, but I have only 50% accuracy on the same data.
Fastai code :

data = pd.read_csv('database.csv', low_memory=False)

dep_var = 'label'
y_block = CategoryBlock()
cont,cat =  cont_cat_split(data, 1, dep_var=dep_var)
cont = cont + cat

splitter = RandomSplitter(valid_pct=0.2)(range_of(data))


to = TabularPandas(data,
                   cont_names = cont,
                   y_names=dep_var,
                   y_block = y_block,,
                   splits = splitter)
                   
dls = to.dataloaders(bs = 64)

learn = tabular_learner(dls, layers=[1, 1, 1], metrics=accuracy)

dummy_input =  next(iter(l_1.dls[0]))[:-1]

learn.fit_one_cycle(10, 1e-3)

torch.onnx.export(learn.model, dummy_input, 'tabular_model.onnx')

Code OpenCV (the last column of the csv file corresponds to the label to predict:

    cv::dnn::Net model =  cv::dnn::readNetFromONNX("tabular_model.onnx");

    std::ifstream infile("database.csv");
    std::string line;
    std::getline(infile, line);
    double count = 0;
    double count_false = 0;
    while(std::getline(infile, line)){
        std::vector<std::string> tokens = tenevia::common::txt::split(line, ',');
        cv::Mat input(1,tokens.size() - 1, CV_32FC1);
        for(int i = 0; i < tokens.size() - 1; ++i){
            input.at<float>(0, i ) = std::stof(tokens[i]);
        }
        model.setInput(input);
        cv::Mat res = model.forward();
        double confidence;
        cv::Point classIdPoint;
        minMaxLoc(res.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
        int classId= classIdPoint.x;
        ++count;
        if(classId != std::stoi(tokens[tokens.size()-1]))
            ++count_false;


    }
    std::cout << "Accuracy " << count_false / count * 100.0 << std::endl;

Is there some normalization that happens under the hood in TabularPandas or in tabular_learner, that I need to apply to data before doing inference with OpenCV?

pivpiv · October 29, 2021, 9:29am

Problem solved! My dataset has columns that contains only 0 and for some reason TabularPanda has changed the order of my columns and sent those columns to the end. Removing those columns solved the problem.