IDEA-Changing Avg/Max pooling layer at the end to something that could be more useful for the network to do better predictions

I am thinking about a point jermy has mentioned in one of the lectures about design of last FC layer-Taking Avg pool we are loosing lot of information but if we flatten out all the channels it increases the dimensionality of network . Instead I am thinking of a two step approach, just wanted to know the views of others before trying it out-

Logic/intution : Base idea is that if the avg value / max value of a channel is very useful for the network to classify, if we provide all the information from that channel by flattening it out along with max/avg pool of other channels that are not most important, it may help it to do it even better at the worse same performance.

Steps:
a) Build a network with max pool/avg pool layer at the end followed by single FC layer mapping to the size of output(Tune it for the problem at hand.I am assuming that we are trying to solve binary class problem ). After training , sort the weights of each of the node in max pool layer with respect to each node in output classes(here its only one node. So just sort the weights for that node)

b) Now select the top 20 nodes (for example purposes can be changed to lesser number) in max pool layer for each of the nodes and now instead of taking max pool/avg pool layer for these top 20 nodes, we can flatten out only those 20 channels & keep max/avg pool from other channels intact and tune the network now by addition of other non linear layers.

Curious to know thoughts of others on this approach.