Get the latest tech news

Nucleotide Transformer: building robust foundation models for human genomics


Nucleotide Transformer is a series of genomics foundation models of different parameter sizes and training datasets that can be applied to various downstream tasks by fine-tuning.

For example, DL models have been used to predict gene expression from DNA sequences 13, 14, 15, 16, 17, 18, with recent advancements combining convolutional neural networks and transformer architectures enabling the encoding of regulatory elements located up to 100 kilobases (kb) upstream 19. We observed that the NT models, without any supervision, learned to distinguish genomic sequences that were uniquely annotated as intergenic, intronic, coding and untranslated regions (UTRs), albeit with varying degrees of proficiency across different layers (Fig. The dataset contained 3,202 high-coverage human genomes, originating from 27 geographically structured populations of African, American, East Asian and European ancestry as detailed in Supplementary Table 2, making up a total of 20.5 trillion nucleotides.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of human genomics

human genomics