Hello to all the amazing Fast.ai community ! I love to help but now in the position where I would like to question some more advanced of you that have worked on similar issues of Tabular.
For my job, I successfully developed Tabular models with Fast.ai that use over 60 categorical features(country, device, etc.) and around 20 continuous features (i.e datepart). These models are now in Production but we are looking to.go further with several new values added each day, requiring us to retrain models.
I was really inspired by the performance and progress done by transfer learning inside of vision (Resnet) and text (ULMfit), but have not seen any research on tabular.
Similarly to work done by Pinterest and Instacart, I would like to reuse the fast.ai categorical embeddings to train new models with less datapoints or similar problems. Exporting the PKL, extracting the weights is simple…
But how to prune the model, and load it inside a new model; while keeping the categorical cat_codes in the same order and efficiency.
Alternatively, we could simply retrain models from scratch all the time, but we feel that would be a waste of computing…or we could load the .PTH file but that does not seem efficient to store on AWS and still does not tell me how to add the new DataBunch.
I’ve followed 2018 FL pt1&2 and DL 2019, I researched several times the forums for different keywords, as well as Google, Github, to find a clear way to do it.
Would extremely appreciate some help !