I’m discovering the joys of ML through the fastai course, and learning a bit more every day thanks to this board, so first of all : thank you !
I am trying to train a tabular learner to predict biophysical variables from remote sensing datasets. I am using a training dataset of limited size (~600 “ground truth” points, acquisition is time consuming) covering the expected range of values for predicted data. The model fit using a standard tabular learner 3 hidden layers fully connected network is fine, reaching R²>0,9.
However, as my ground truth training dataset is sparse and cannot cover all real life situations, i often end up with predicted values that skyrocket, and do not have any real world meaning. Thus, i am thinking about a strategy to overcome this issue, but so far didn’t read/understood anything that could help me.
I was thinking that maybe there is a method to constraint the value range of the model output, maybe through a clever use of activation functions such as sigmoid, but i am still learning and doesn’t know where to start. That is why i am posting for the first time, asking for some guidance regarding this issue !
Is anyone having an idea how to overcome this ?
Thanks in advance !