Get the latest tech news
Nucleotide Transformer: building robust foundation models for human genomics
Nucleotide Transformer is a series of genomics foundation models of different parameter sizes and training datasets that can be applied to various downstream tasks by fine-tuning.
For example, DL models have been used to predict gene expression from DNA sequences 13, 14, 15, 16, 17, 18, with recent advancements combining convolutional neural networks and transformer architectures enabling the encoding of regulatory elements located up to 100 kilobases (kb) upstream 19. We observed that the NT models, without any supervision, learned to distinguish genomic sequences that were uniquely annotated as intergenic, intronic, coding and untranslated regions (UTRs), albeit with varying degrees of proficiency across different layers (Fig. The dataset contained 3,202 high-coverage human genomes, originating from 27 geographically structured populations of African, American, East Asian and European ancestry as detailed in Supplementary Table 2, making up a total of 20.5 trillion nucleotides.
Or read this on Hacker News