tabularGP: gaussian processes with fastai!

nestorDemeure · March 21, 2020, 1:11pm

I just finished writing tabularGP, an implementation of gaussian processes built on top of fastai V1 and pytorch

It was designed so that you can trivially take a fastai tabular deep learning model and turn it into a tabular gaussian process model (the examples notebooks try to be exhautive).

For those unfamiliar with gaussian process, they have/are:

very good with small (few dozens to few thousands) datasets
few hyperparameters to tune
free uncertainty estimation on the ouputs
free feature importance computation
easy transfert learning
very little overfit thanks to built in bayesian regularisation

In practice, I found gaussian processes to be better than deep neural networks when you have small data (any where from 10 to 5000 training points) and a relatively simple space (no complex interaction between features).

While working on an upcoming paper, I got a 33% RMSE improvement over a carefully tuned neural network with minimal efforts (on a 4000 points training set) : so definitely worth testing

There is a catch however: on a large dataset the algorithm will both slow and beaten by neural networks due to its high complexity. But if you have a smallish dataset, its worth knowing and using

nestorDemeure · October 4, 2020, 2:25pm

I just updated the code for fatai V2

There is a single error left that I am not able to solve at the moment, I get a TypeError: list indices must be integers or slices, not list when using the predict method which does not happen with fastai tabularLearner (cf example 1). Any help would be appreciated.

nestorDemeure · October 9, 2020, 6:49pm

It took some times but all the bugs are now squashed and you can use gaussian process on Fastai

harikrishnanrajeev · October 22, 2020, 6:31pm

thank you so much for this wonderful library. Is there a version that will work for fastai V1 ?

nestorDemeure · October 22, 2020, 6:57pm

Thank you

The code was first developped with fastai V1, you can find the lastest V1 version here in the commit history (having the corresponding commit should let you install it without to much problems).

harikrishnanrajeev · October 23, 2020, 2:55am

cholesky_cpu: U(1,1) is zero, singular U

Have you faced this error. (Not able to share more details). Please let me know if you have seen this error.

thanks much

nestorDemeure · October 23, 2020, 5:16am

It is mentionneed in the readme, this is usually solved by using a lower learning rate and/or more nb_training_points.

The root cause is that the points and covariance function produced a linear system that cannot be solved.