The mutation spectrum revealed by paired genome sequences from a lung cancer patient


Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease1,2. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum3,4,5,6,7,8. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines9,10,11,12,13. Here we present the complete sequences of a primary lung tumour (60× coverage) and adjacent normal tissue (46×). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The genomic landscape of somatic alterations.
Figure 2: Somatic single-nucleotide mutation trends and patterns.
Figure 3: A model for how the multiplicity of mutations within the MAPK cascade may act together to drive constitutive pro-growth signalling.

Accession codes

Primary accessions


Gene Expression Omnibus

Data deposits

Sequence data has been submitted to the NCBI Short Read Archive under accession number SRA012097. Microarray data has been submitted to the NCBI Gene Expression Omnibus under accession number GSE20585.


  1. 1

    Parkin, D. M., Bray, F., Ferlay, J. & Pisani, P. Global cancer statistics, 2002. CA Cancer J. Clin. 55, 74–108 (2005)

  2. 2

    Herbst, R. S., Heymach, J. V. & Lippman, S. M. Lung cancer. N. Engl. J. Med. 359, 1367–1380 (2008)

  3. 3

    Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature Genet. 40, 722–729 (2008)

  4. 4

    Davies, H. et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 7591–7595 (2005)

  5. 5

    Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008)

  6. 6

    Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007)

  7. 7

    Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009)

  8. 8

    Weir, B. A. et al. Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893–898 (2007)

  9. 9

    Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361, 1058–1066 (2009)

  10. 10

    Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008)

  11. 11

    Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809–813 (2009)

  12. 12

    Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2009)

  13. 13

    Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010)

  14. 14

    Hecht, S. S. Tobacco smoke carcinogens and lung cancer. J. Natl Cancer Inst. 91, 1194–1210 (1999)

  15. 15

    Chu, P. G. & Weiss, L. M. Expression of cytokeratin 5/6 in epithelial neoplasms: an immunohistochemical study of 509 cases. Mod. Pathol. 15, 6–10 (2002)

  16. 16

    Tan, D. et al. Thyroid transcription factor-1 expression prevalence and its clinical implications in non-small cell lung cancer: a high-throughput tissue microarray and immunohistochemistry study. Hum. Pathol. 34, 597–604 (2003)

  17. 17

    Wistuba, I. I. & Gazdar, A. F. Lung cancer preneoplasia. Annu. Rev. Pathol. 1, 331–348 (2006)

  18. 18

    Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010)

  19. 19

    Forbes, S. A. et al. The catalogue of somatic mutations in cancer (COSMIC). Curr. Protoc. Hum. Genet. 10.1002/0471142905.hg1011s57 (2008)

  20. 20

    Stenson, P. D. et al. The human gene mutation database: 2008 update. Genome Med. 1, 13 (2009)

  21. 21

    Hicks, J. et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 1465–1479 (2006)

  22. 22

    Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007)

  23. 23

    Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007)

  24. 24

    Soda, M. et al. Identification of the transforming EML4ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007)

  25. 25

    Lin, E. et al. Exon array profiling detects EML4ALK fusion in breast, colorectal, and non-small cell lung cancers. Mol. Cancer Res. 7, 1466–1476 (2009)

  26. 26

    Rowley, J. D. A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973)

  27. 27

    Tomlins, S. A. et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 (2005)

  28. 28

    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009)

  29. 29

    Dhillon, A. S., Hagan, S., Rath, O. & Kolch, W. MAP kinase signalling pathways in cancer. Oncogene 26, 3279–3290 (2007)

Download references


We thank T. Wu for critical reading of manuscript, C. Santos for sample handling, M. Vasser and the DNA Synthesis Group for oligonucleotide synthesis, J. Turcotte and G. Cavet for coordination, G. Nilsen for data submission, J. Fitzgerald and A. Baucom for data storage, J. Lee for laboratory support, A. Bruce for graphical assistance, and T. Bhangale, S. Jhunhunwala and A. Halpern for discussion.

Author information

W.L., project coordination, SNV and overall data analysis and preparation of manuscript; Z.J., structural variation analysis and preparation of manuscript; J.L., mutation pattern and trend analysis, loss of heterozygosity analysis, expression analysis and preparation of manuscript; P.M.H., copy number/loss of heterozygosity analysis, pathway analysis, expression analysis and preparation of manuscript; P.Y., mutation analysis and preparation of manuscript; Y.G. and Z.M., PCR validation of structural variations; J.S., D.B. and S.S., MassArray mutation validation; Y.Z., bioinformatic prediction of mutations and data processing; K.P.P., M.I.K., I.N. and A.B.S., DNA nanoball preparation and sequencing, base calling, quality control and structural variation mapping; C.H. and Z.M., microarray data production; S.J. and H.S., sample handling and pathology analysis; C.W., structural variation breakpoint mapping; D.S.S., pathway analysis and data interpretation; R.G., manuscript critiques and statistical analysis; F.J.d.S., project coordination and manuscript commenting; A.P. and S.M., FISH analysis; R.D. and D.G.B., project coordination, data interpretation and manuscript commenting; Z.Z., project design, data interpretation and preparation of manuscript.

Correspondence to Zemin Zhang.

Ethics declarations

Competing interests

Authors are employees of either Genentech Inc. or Complete Genomics Inc. Employees of Complete Genomics have stock options in the company.

Supplementary information

Supplementary Information

This file contains Supplementary Sections S1-S10, Supplementary References, legends for Supplementary Tables 1-7 and Supplementary Figures 1-17 with legends. (PDF 4981 kb)

Supplementary Tables

This file contains Supplementary Tables 1 – 7, including column descriptions. See Supplementary Information file for legends. (XLS 11059 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lee, W., Jiang, Z., Liu, J. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473–477 (2010).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.