Constraining the prediction range on a tabular learner?

codename5281 · April 16, 2020, 1:44pm

Hello !

I’m discovering the joys of ML through the fastai course, and learning a bit more every day thanks to this board, so first of all : thank you !

I am trying to train a tabular learner to predict biophysical variables from remote sensing datasets. I am using a training dataset of limited size (~600 “ground truth” points, acquisition is time consuming) covering the expected range of values for predicted data. The model fit using a standard tabular learner 3 hidden layers fully connected network is fine, reaching R²>0,9.

However, as my ground truth training dataset is sparse and cannot cover all real life situations, i often end up with predicted values that skyrocket, and do not have any real world meaning. Thus, i am thinking about a strategy to overcome this issue, but so far didn’t read/understood anything that could help me.

I was thinking that maybe there is a method to constraint the value range of the model output, maybe through a clever use of activation functions such as sigmoid, but i am still learning and doesn’t know where to start. That is why i am posting for the first time, asking for some guidance regarding this issue !

Is anyone having an idea how to overcome this ?

Thanks in advance !
J.

muellerzr · April 16, 2020, 3:22pm

You can use a sigmoid, pass in a y_range to tabular_learner (like (0,n)

codename5281 · April 17, 2020, 9:34am

Thanks for the suggestion, i tried that and got interesting results.
I also performed data augmentation by simulating noise on my input variables corresponding to the known sensor accuacy levels, and by drawing ground truth target values from observed distributions. So far so good !

codename5281 · August 6, 2020, 10:17am

Hello, it’s me again !
So i’ve been playing with this and it appears that often when i want to get predictions from real world data, my predicted values are jumping from one side to the other side of the y_lim range.
This puzzles me as my training set is supposed to cover a very wide range of values that can be expected in real world situations.
Do you have any recommandation about how to tackle this situation ?
Thanks !
J.