Hi fellow Alumni!
Thanks to Soumith Chintala I just came across this paper: https://arxiv.org/abs/1806.03723
It gives a clean and nice solution to learning network size by using SwitchLayers (layers that learn a parameter beta for each neuron, based on its performance contirbutions) after Dense/Conv layers (after BatchNorm according to the paper), explicitly pruning the network during training, resulting in a smaller (and less sparse) net at the end of the process.
The GitHub page seems to be this one: https://github.com/mitdbg/fastdeepnets produced by Guillaume Leclerc for his master thesis (amazing).
What do you think? On paper, it seems really a smart and simple way to tackle this problem!