Author: Brooke Wolford
Editors: Alex Taylor, Jimmy Brancho, Bryan Moyers
Imagine the year is 1856 and you are toiling in a quarry in the Neander Valley, a few kilometers from Düsseldorf, Germany. Strangely, something is abruptly sticking out of the landscape. You dig around and find ribs, a skull, and other bones—your best guess is that you have stumbled upon the final resting place of a bear. However, what you have actually found are the first identifiable remains of ancient hominins, later named Homo neanderthalensis.
These were the humble beginnings of the field of paleoanthropology. This area of science studies fossilized skeletal remains, stone tools, and ancient settlements in order to answer profound questions about the origin of modern day humans. For example, how are Neanderthals related to modern day humans—did we interbreed with them or evolve from them? To answer this question, paleoanthropologists construct family trees of ancient hominins using the physical features of skeletal remains along with evidence about when and where they lived. However, DNA from these skeletal remains can tell us even more about the connection between Neanderthals and modern day humans. Paleogenetics is the study of preserved DNA from the fossils of ancient organisms like Neanderthals. Unlike paleoanthropologists, paleogeneticists piece together ancient lineages by quantifying how similar DNA is between ancient hominins. To do so they use the most notable paleogenetic milestone to date—the sequence of the entire Neanderthal genome.
Obtaining the complete genome of any living organism is a gargantuan undertaking—reading the sometimes billions of nucleotides that make up a lifeform’s DNA is much like reading through a set of encyclopedias. Each encyclopedia is a chromosome, each subject entry a gene, and each typed letter a DNA nucleotide. Now imagine years of wear and tear on those encyclopedias; water stains and ripped pages make it difficult to read each individual letter. In the same way, the DNA in a fossil is degraded over time by exposure to high temperature, water, or substances that modify the nucleotides. This, among other challenges, makes ancient DNA difficult to accurately read, and special protocols and technologies had to be developed or implemented to 1) efficiently extract DNA from ancient samples, 2) sequence the DNA, and 3) accurately align the genome.
A special technique called the silica extraction method was used to retrieve DNA from ancient bones. This method used a fine glass powder called silica to bind DNA so it can be washed and isolated from unwanted chemicals that exist in the bone. DNA extracted from the bones of ancient individuals is often contaminated by DNA from microbial organisms, like bacteria, that live on and in the fossil and by human DNA introduced during the DNA extraction and sequencing reactions in the laboratory. The goal of the molecular techniques used for Neanderthal DNA extraction are to increase the signal to noise ratio or, in other words, to sequence more Neanderthal DNA and less contaminating DNA. One method to enrich for Neanderthal DNA is to use restriction enzymes to preferentially cut bacterial DNA like microscopic scissors. The chopped up DNA is more efficiently degraded—thereby increasing the Neanderthal to microbial DNA ratio. To avoid human contamination, meticulous care is taken in clean rooms with precautions such as bleach, ultraviolet light, hair nets, and special lab coats.
In the early 2000s, advances in DNA sequencing enabled sequencing of the complete genome from Neanderthal samples. DNA sequencing is the process of taking the DNA strands from a sample, like ancient DNA extracted from a fossilized bone, and using chemical reactions that identify each nucleotide in order. This is done for each of millions of individual DNA strands in a sequencing machine. Originally the Neanderthal genome was sequenced via direct pyrosequencing, a method that reads long strands of DNA, but only a small number at a time. The advent of next generation sequencing enabled millions of shorter reads per run of the machine. The scientists sequencing the Neanderthal genome adopted the new method because it was better suited for efficiently sequencing the short strands of ancient DNA. As a result, the complete Neanderthal genome was sequenced in a shorter time frame than it would have been with pyrosequencing.
Aligning the genome
The final step in sequencing a genome is alignment, where the short, individual sequences are joined together, using the human and chimp genomes as a guide, to form a complete genome with billions of nucleotides. Because ancient DNA often exists in short fragments, this poses a problem for genome alignment. Imagine creating a whole set of encyclopedias from the weatherworn individual lines of text. Traditionally, alignment methods use long sequence reads as a scaffold to be built upon with shorter reads. In the case of ancient DNA, bioinformaticians (scientists who use computer programming to analyze biological data) had to create clever algorithms to piece together the individual lines without larger paragraphs to serve as scaffolds. Bioinformaticians also found ways to computationally correct for the chemical modifications that changed nucleotides over time.
The final product
It took decades to perfect these methods and Dr. Svante Pääbo, a Swedish scientist and a founder of the field of paleogenetics, contributed immensely to this effort and the field of ancient DNA as a whole. Pääbo started in paleogenetics as a graduate student when he identified DNA was still present in Egyptian mummy cells. Later, he and his students developed many of the aforementioned methods for ancient DNA extraction and analysis using samples of extinct mammals, such as wooly mammoths and cave-bears, before turning to the precious ancient hominin samples. In 1997, Pääbo and collaborators sequenced almost 400 nucleotides of DNA from the 1856 Neanderthal fossil—DNA almost 40,000 years old! By 2006, Pääbo was setting his sights on sequencing the complete Neanderthal genome.
Over the next few years Pääbo’s group employed ingenious methods to extract and sequence DNA from Neanderthal bones excavated between 1974 and 1986 in Vindija Cave, Croatia. Bioinformatic approaches accurately aligned the sequences to the human genome, creating a 1.3x coverage draft Neanderthal genome published in 2010. This measure of coverage indicates that each of the ~3 billion nucleotides is sequenced an average of 1.3 times—some nucleotides are sequenced more than once while others are not sequenced at all. In order to have a more accurate Neanderthal genome Pääbo’s group needed to sequence every nucleotide more than once. In 2014 they published a complete, high quality, 52x coverage Neanderthal genome from a toe bone found in Denisova Cave in the Altai Mountains in 2010.
In Pääbo’s 2010 book Neanderthal Man: In Search of Lost Genomes, he details his scientific odyssey and touches on questions paleoanthropologists, and paleogeneticists, have been asking for decades. Why are we still around but Neanderthals are not? What genes and therefore traits, are unique to us? What genetic signatures of Neanderthals affect the health of people alive today?
Because Neanderthals are the closest evolutionary relative of modern day humans their DNA holds many answers to questions we have about our origins. Analysis of the genomes of multiple Neanderthal individuals begins to answer the question of how Neanderthals are related to modern day humans: did our ancestors mate with Neanderthals? Pääbo’s work shows that present day Eurasian genomes have between 1.5 and 2.1% Neanderthal DNA. This suggests that the ancestors of modern day humans that settled in Eurasia interbred with Neanderthals while those that remained in Africa did not.
We have come a long way since Neander Valley in 1856 but we will be asking questions about human origins and our relations with Neanderthals for years to come. Thanks to the efforts of Dr. Pääbo and his collaborators we now have the whole set of encyclopedias, the Neanderthal genome, in our hands.
This is the first in the three-part series “Ancient DNA, Electronic Health Records, and Neanderthal Phenotypic Legacy” In part II, we will explore Electronic Health Records (EHRs) which allow clinicians and scientists to harness medical information for research in the digital age. Part III will tie parts I and II together, highlighting a recent paper where the Neanderthal genome and EHRs were used to demonstrate that Neanderthal genes affect the risk for depression in modern day humans.
About the author
Brooke is a PhD student in the Department of Computational Medicine and Bioinformatics. Her research focuses on understanding the genetic causes of complex diseases, specifically type 2 diabetes and cardiovascular traits. She is broadly interested in how genomics research can apply to clinical sequencing and patient care to fulfill the promise of Precision Medicine. Originally from North Carolina, Brooke holds a Bachelor of Science in Quantitative Biology from the University of North Carolina at Chapel Hill. When not staring at lines of code and sequence data, she enjoys reading, a good NETFLIX binge, and baking. Read more from Brooke here.
Image Credit: By Photaro-Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30468684