I am very excited to share our recent work on protein classification that appeared on bioRxiv today.
We pretrain a language model on Swiss-Prot, a large protein database, and finetune a classifier on three different protein classification tasks. It turns out that the model is quite competitive compared to state-of-the-art models that make use of precomputed PSSM features from expensive database similarity searches. There seem to be a lot of possible applications for NLP-methods in the domain of proteomics.
Happy to hear your thoughts on this…