Model Pruning in fast.ai

karanchahal · July 8, 2019, 4:05pm

Hello ! Glad to meet another pruning/compression enthusiast, I was personally quite surprised to see that it worked so well ! About the formula, I have taken it from the Prune or Not to prune paper and I think it is meant to be the way it was. You might notice the sparsity goes up really quickly after epoch 1 but then prunes fewer and fewer weights in later epochs.

The authors say this about the formula : “The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and
gradually reduce the number of weights being pruned each time as there are fewer and fewer weights
remaining in the network, as illustrated in Figure 1.”

Kernel level pruning is fascinating, however I read somewhere that sparsification results in lesser accuracy dips than kernel level pruning, The paper which gives a good set of experiments on this is given here.

I was thinking of using sparsification and modifying the block sparse kernels that open AI open sourced to play well with pytorch and then measuring speed ups, of course right now the speed ups are theoretical. If someone is able to do that, this method could be useful right now. Sadly pytorch currently doesn’t support quantisation (though work is ongoing !).