Seven years ago, the launch of the International Cancer Genome Consortium (ICGC) ushered in a new era in the understanding of cancer, with the goal of improving clinical diagnosis and treatment of patients.
The Hospital Clínic de Barcelona and Universidad de Oviedo completed the first genome sequencing project of 506 people affected by chronic lymphocytic leukaemia. Their results demonstrated the importance of DNA analysis in understanding the role of non-coding regions of the genome in the development of tumours. In particular, their study found an average of 3,000 mutations per patient that differentiated cancer cells from healthy ones. They also discovered 12 new genes linked to the disease.
Since the completion of the Human Genome Project, genome sequencing has proved to be an essential tool in the fight against cancer. Thanks to genetic big data, for example, it has been possible to establish the first molecular consensus against colorectal cancer. However, all that glitters is not gold in DNA analysis.
A new project led by the Centro Nacional de Análisis Genómico (CNAG-CRG) and the German Cancer Research Center (DKFZ), and published in Nature Communications, has revealed a high degree of heterogeneity in how cancer genome sequencing is performed. The research, which has been conducted by ICGC scientists, argues that we need to understand the variables affecting somatic mutation analysis since genomic sequencing has become an essential clinical tool.
The study analysed the sequencing methods, analysis pipelines and ways of validating used in the Consortium’s groups. The aim was to create “standards” to be used when sequencing different cancer genomes. Currently, 74 ICGC projects are analysing over 25,000 patient cases. They are doing this with premises established less than a decade ago based on costs, skills and analytical expertise: comprehensive identification of specific somatic mutations in a tumour would require whole genome sequencing (WGS) with a minimum coverage of 30x for each tumour and normal genomes, with readings of 100-250 base pairs.
The reality is different. The scientists have seen that coverage of the tumour and normal samples and DNA readings will vary depending on the sample preparation. As a result, the paper published in Nature Communications has identified substantial methodological differences in sequencing the genome of the various types of tumours. In other words, the different technical approaches resulted in major discrepancies in results. For example, out of more than 1,000 confirmed somatic single-base mutations in the cancer genome analysed, only 40% were unanimously identified by all participating teams. The percentage was even lower for small insertions or deletions. The research found that only a single mutation out of 337 was identified in all centres, which is a rate of 0.3%.
These very different outcomes led the Consortium to propose a reference mutation dataset to evaluate analytical procedures. This involves a “standard database” in order to improve procedures for identifying somatic mutations associated with cancer, by reducing false negatives and false positives. In particular, the scientists have come up with a number of recommendations which include not using the PCR reaction in library preparation, coverage should be greater than 100x, monitoring coverage should be similar to that of the tumour (± 10%) and results should be filtered based on the quality of the DNA mapping.
“The findings of our study have far-reaching implications for cancer genome analysis,” says Ivo Gut, Director of the CNAG-CRG in Barcelona. The research found many inconsistencies in both the sequencing and the data analysis, so this paper will help to improve systems and hence “generate more standardised and consistent results” he argues. The paper is thus intended to be a reference for all scientific groups using DNA analysis in the study of cancer in order to open up a new era in genomics and its usefulness as a clinical tool.