Get the latest tech news
A review on protein language models
Protein “language” is a lot like human language. Given the similarities, researchers have been building and training language models on protein sequence data, replicating the success seen in other domains, with profound implications. In this post, I will explore how transformer models have been applied to protein data and what
A lot of the initial work in pLMs is based on Encoder-only Transformer architecture as their aim is to obtain embedded representations of proteins in a vector space for downstream tasks. The amino acid types and frequencies closely matched those in nature, and the sequences exhibited a comparable balance between ordered (stable) and disordered (flexible) regions. By selectively substituting the amino acids at these divergent sites with those predicted by the model, the researchers significantly enhanced the binding affinity, thermostability, and in vitro potency of the antibodies—achieving increases of up to sevenfold for mature antibodies and an astonishing 160-fold(!!)
Or read this on Hacker News