The human genome sequence, first published in 2001, has some important information missing. The latest version of it, called GRCh38, has a monstrous 3.1 gigabases of information—but that’s still not enough. A letter published in Nature Genetics this week finds that the reference genome is missing a colossal 10 percent of the genetic information found in the genomes of hundreds of people with African ancestry—information that also appears in other human populations.
Get the reference
The “human genome” is in fact assembled from the genomes of just a handful of people, with the majority of GRCh38 coming from just one person. It’s not a snapshot of what’s in human DNA so much as a kind of template and roadmap, giving a sense of what’s in there and allowing comparisons between individuals and the “reference genome.”
We’ve known this is a limitation and have been making constant additions to the reference genome, which has improved its ability to represent the huge range of variation that’s present in modern humans. But because its source is so limited, write the authors of this week’s letter, so is its usefulness: “In recent years, a growing number of researchers have emphasized the importance of capturing and representing sequencing data from diverse populations.”