Hi,
I have created a tabular model for binary classification with only continuous variable without data preprocessing . I achieve 83% accuracy on validation set during training. Then I export the model to ONNX format and load it back with OpenCV, but I have only 50% accuracy on the same data.
Fastai code :
data = pd.read_csv('database.csv', low_memory=False)
dep_var = 'label'
y_block = CategoryBlock()
cont,cat = cont_cat_split(data, 1, dep_var=dep_var)
cont = cont + cat
splitter = RandomSplitter(valid_pct=0.2)(range_of(data))
to = TabularPandas(data,
cont_names = cont,
y_names=dep_var,
y_block = y_block,,
splits = splitter)
dls = to.dataloaders(bs = 64)
learn = tabular_learner(dls, layers=[1, 1, 1], metrics=accuracy)
dummy_input = next(iter(l_1.dls[0]))[:-1]
learn.fit_one_cycle(10, 1e-3)
torch.onnx.export(learn.model, dummy_input, 'tabular_model.onnx')
Code OpenCV (the last column of the csv file corresponds to the label to predict:
cv::dnn::Net model = cv::dnn::readNetFromONNX("tabular_model.onnx");
std::ifstream infile("database.csv");
std::string line;
std::getline(infile, line);
double count = 0;
double count_false = 0;
while(std::getline(infile, line)){
std::vector<std::string> tokens = tenevia::common::txt::split(line, ',');
cv::Mat input(1,tokens.size() - 1, CV_32FC1);
for(int i = 0; i < tokens.size() - 1; ++i){
input.at<float>(0, i ) = std::stof(tokens[i]);
}
model.setInput(input);
cv::Mat res = model.forward();
double confidence;
cv::Point classIdPoint;
minMaxLoc(res.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
int classId= classIdPoint.x;
++count;
if(classId != std::stoi(tokens[tokens.size()-1]))
++count_false;
}
std::cout << "Accuracy " << count_false / count * 100.0 << std::endl;
Is there some normalization that happens under the hood in TabularPandas or in tabular_learner, that I need to apply to data before doing inference with OpenCV?