Cambridge is an emblematic city in the history of science. Nestling among faculties and laboratories lies the Biochemistry Department, where illustrious researchers of the ilk of Hans Kornberg and César Milstein once worked. It was also where a young Fred Sanger completed his PhD in 1943.
After tragically losing his parents to cancer, the British scientist decided to dedicate his life to his work. Shortly after completing his doctoral thesis, Sanger began to take an interest in protein sequencing, and his research earned him his first Nobel Prize in Chemistry in 1958, after he deciphered the complete sequence of insulin. Far from curbing his curiosity, the scientific community's acknowledgement spurred him on to tackle new challenges. In 1975, he started up the Sanger method, the pioneering DNA sequencing technique. Two years later, he published, in the journal Nature , the reading of the complete genome of the bacteriophage phi-X174, which had only 5,375 nucleotides. This methodology brought him his second Nobel Prize in 1980, this time shared with Paul Berg and Walter Gilbert. Curiously enough, Sanger thus joined the short list of scientists that had managed to win two Nobel Prizes, along with Marie Curie, Linus Pauling and John Bardeen.
Forty years after the birth of DNA sequencing, and following historic accomplishments such as the completion of the Human Genome Project, research in biology has progressed in leaps and bounds. As Ivo Gut, director of the Spanish Genomic Analysis Centre (CNAG), explains, “sequencing has evolved more than computers in the last ten years in terms of increase of output and decrease of cost”. Moreover, “the rate of progress has been staggering and many new applications that use 2nd generation sequencing have been developed”, according to Gut.
Typical cost of sequencing a human-sized genome, on a logarithmic scale. Note the drastic trend faster than Moore's law beginning in January 2008 as post-Sanger sequencing came online at sequencing centers. National Human Genome Research Institute
The CNAG, the reference infrastructure in the field of sequencing and large-scale DNA analysis, “has increased its daily output 20-fold and the rate is still increasing”, says Gut. The centre uses much more advanced methods than the one proposed by Sanger in the late 1970s. Nowadays, different sequencing techniques are used depending on the final application. In the words of the CNAG'S director, “exome sequencing is very effective for diagnosis in rare diseases, while cancer genomes are covered best using whole genome sequencing”. Due to the enormous amount of information generated, organisations such as the CNAG are facing a new challenge: how to produce, analyse, store and manage big data in genomics.
When Bill Clinton and Tony Blair, accompanied by Francis Collins and Craig Venter, presented the first draft of the Human Genome Project at the White House, many people foresaw that sequencing could be applied to study the genomics of cancer. At this moment in time, according to Gut, “sequencing is moving further towards clinical application in rare diseases as well”. The scientist explains that “the applicability of sequencing is very broad, from basic research for clinically applied work, study of populations, fundamental and translation”. However, the reading of DNA is not only used in biomedical research. Nowadays, CNAG is also working on projects related to the food industry, improving its production “through the development of markers that are indicative of resistance to certain environmental strains (infections, heat or water requirements) or the commercial protection of varieties, in which sequencing-based approaches provide the ultimate power”.
Sequencing has not only changed in terms of applications, since economic costs and employee time have also been reduced. As Ivo Gut explains, “consumables cost for a human genome at 30x coverage has gone down to 1000 euros”. However, there are many applications that require more than 30x coverage. According to the CNAG's director, “as output increases the weight does shift to the effort that needs to be made in data analysis and interpretation”. For this reason, the centre has paid great attention to balancing data production with data analysis capability, while also making great efforts to keep computational developments in line with its sequencing capacity. Its technology services are now promoting the development of more than 300 research projects with 120 different collaborators, a scientific effort which was unimaginable 40 years ago, when DNA sequencing was born.