Stumbled upon this paper recently. Any thoughts on its effectiveness (etc)?
Parameter-efficient Transfer Learning for NLP