Protein Language Models, Transformers, Fine-tune, ESM2_t33_650M_UR50D, Error analysis
Key Takeaways
I finetune antibody specific and general protein language model for predicting binding affinity (Kd) between single-chain variable fragments (scFv) and the target the SARS-CoV-2 peptide.
esm2_t33_650M_UR50D model demonstrated superior performance compared to antibody-specific language models, namely antiberta2-cssp, antiberta2, and ablang-H. Notably, while the performance of ablang-H lagged significantly behind, the remaining three models showed relatively comparable results. However, it is important to note, esm2_t33_650M_UR50D is larger, containing about 650 million parameters, whereas both antiberta2 and antiberta2-cssp have around 202 million parameters.
My approach exceeded the original study’s Spearman rho (0.64 vs ~0.50) on hold-out set using just a single model, as opposed to the original study’s ensemble of 16 models.
OpenAI’s CLIP model is quite impressive at connecting text with images, efficiently learning visual concepts from natural language supervision. Can we do the same for protein sequences and structure? Instead of text and images, we’re integrating protein sequences with their structural information.
In a series of four accessible notebooks we develop a multimodal training approach that integrates antibody sequence data with structural data using a contrastive learning framework inspired by OpenAI’s CLIP. We utilize the ESM2 model from Facebook’s Evolutionary Scale Modeling (ESM) suite as our base architecture. The model was fine-tuned with a custom head for contrastive learning, where the sequence and structural embeddings were projected into a common latent space. Our training process focused on minimizing the contrastive loss to ensure that sequences and their corresponding structures were closely aligned in this space. We called this model ESM2-Ab-CLIP.
We also evaluate the effect of ESM2-Ab-CLIP model on antibody binding affinity prediction task relative to the base ESM2 model. With ample data for fine-tuning, there were no additional benefits of multimodal training. However, the true strength of the ESM2AbCLIP model emerged under a low data regime. In this scenario, the ESM2AbCLIP model outperformed the base ESM2 model, on antibody binding affinity task as measured by spearmanr and Top 10% recall.
I fine-tuned both antibody-specific and general protein language models to predict the binding affinity (Kd) between single-chain variable fragments (scFv) and the SARS-CoV-2 peptide. This involved customizing models to better fit the specific characteristics of antibody-peptide interactions.The ESM2_t33_650M_UR50D model outperformed other antibody-specific language models. Despite being larger, with approximately 650 million parameters, it achieved superior performance compared to the smaller models like antiberta2-cssp and antiberta2, which each have around 202 million parameters. The abLang-H model, despite being a strong contender, lagged significantly behind. My approach surpassed the original Manage Publix passport study’s Spearman correlation coefficient (rho) score of ~0.50, achieving a rho of 0.64 on the hold-out set using a single model. This is a notable improvement, considering the original study employed an ensemble of 16 models to achieve a similar performance.