Deep Teacher, Shallow Student - is it possible?

Is it possible to borrow ideas from Knowledge Distillation and post-hoc explainability techniques, to train a shallow (xgboost) student model from a complex (GNN) teacher model?

Besides the performance issues, I’m asking this for adoption in platforms more comfortable with shallow model deployments.

If anyone has tried this, can you please point me to some relevant work? Thank you.

Stuff like Knowledge Distillation requires modifying the loss function so that it includes the output from the teacher as well. You can use this with any training method that has a loss function, and that allows you to provide a custom version of the loss function. I’m no expert on xgboost, so I’m not sure if that allows for this.