ULMFiT for Proteomics

(Nils) #1

I am very excited to share our recent work on protein classification that appeared on bioRxiv today.

We pretrain a language model on Swiss-Prot, a large protein database, and finetune a classifier on three different protein classification tasks. It turns out that the model is quite competitive compared to state-of-the-art models that make use of precomputed PSSM features from expensive database similarity searches. There seem to be a lot of possible applications for NLP-methods in the domain of proteomics.

Happy to hear your thoughts on this…


(Thundering Typhoons) #2

Is it possible to share the pretrained embeddings?


(Nils) #3

Thanks for your interest. We will try to release the code and some pretrained models as soon as possible.


(Nils) #4

Sorry for letting you wait so long. An updated version of our preprint is now available on bioRxiv. We also set up a GitHub repository with source code and links to pretrained models. Happy finetuning!