Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 1 Landmarks in DNA sequencing • 1953 Discovery of DNA double helix structure • 1977 – A Maxam and W Gilbert "DNA seq by chemical degradation" – F Sanger"DNA sequencing with chain-­‐terminating inhibitors" • 1984 DNA sequence of the Epstein-­‐Barr virus, 170 kb • 1987 Applied Biosystems -­‐ first automated sequencer • 1991 Sequencing of human genome in Venter's lab • 1996 P. Nyrén and M Ronaghi -­‐ pyrosequencing • 2001 A draft sequence of the human genome • 2003 human genome completed • 2004 454 Life Sciences markets first NGS machine 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven -­‐ KU Leuven Massive parallel sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven -­‐ KU Leuven Landmarks in NGS Roche 454 Solexa/Illumina E. coli (5Mb) SOLiD Arabidopsis thaliana (157 Mb) 200 K reads 120 bp 30M reads 35 bp 100M reads 35 bp 2005 2006 2007 6 Landmarks in NGS Roche 454 Illumina SOLiD Ion torrent PacBio RS E. coli (5Mb) Arabidopsis thaliana (157 Mb) 200 K reads 30M reads 100M reads 120 bp 35 bp 35 bp 2005 2006 2007 2008 2009 2010 7 DNA Sequencing – the next generation NGS refers to non-­‐Sanger-­‐based high-­‐throughput DNA sequencing technologies. Millions or billions of DNA strands can be sequenced in parallel DNA Sequencing – the next generation • NGS refers to non-­‐Sanger-­‐based high-­‐ throughput DNA sequencing technologies. • NGS technologies constitute various strategies that rely on a combination of – Library/template preparation – Parallel sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven DNA Sequencing – the next generation Sample prep 10/1/2015 Clonal Amplification Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven Parallel sequencing 11 Roche GS FLX 454 & Roche Junior 454 SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 12 454 sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 13 454 sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 14 454 sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 15 Life Technologies SOLiD 5500 Genetic Analyzer SOLID SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 16 SOLiD sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 17 SOLiD sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 18 Life Technologies: Ion Proton & Ion PGM ION TORRENT SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 19 Ion Torrent Sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 20 Ion Torrent Sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 21 Illumina HiSeq & NextSeq & MiSeq ILLUMINA (SOLEXA) SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 22 Illumina sequencing Library • All sample preparation protocols regardless of the application end with the same product: – Double-­‐stranded DNA with the insert to be sequenced flanked by adapters 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 23 Illumina library prep 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 24 Illumina Sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 25 Illumina Sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 26 Illumina Sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 27 Helicos BioSciences: November 15, 2012, bankrupt HELISCOPE SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 28 DNA Sequencing – the next generation Sample prep 10/1/2015 Clonal Amplification Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven Parallel sequencing 29 Heliscope sequencing 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 30 Oxford Nanopore Technologies: GridION & MinION NANOPORE SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 31 Oxford Nanopore Technologies: GridION & MiION NANOPORE SEQUENCING Pacific Biosciences PacBio RS II SMRT SEQUENCING 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 33 PacBio history • 2010 -­‐ PacBio seduced investors with a promise of technology revolution – A whole human genomes for $100 – in about 15 minutes • 2011 -­‐ GC applies for funding for third generation sequencer PacBio history • 2012 -­‐ None of those predictions came true – Few scientists bought the one-­‐ton instrument. – PacBio • market valuation of less than $70 million • technology value of $0. • $600 million of cash down the toilet. • 2012 – GC gets funding for PacBio! • Oxford Nanopore announced at AGBT PacBio history • 2012 – New CEO Mike Hunkapiller @ PacBio • 2013 – GC installs PacBio – PacBio improved and has a niche • ability to detect structural genetic variations • creating high-­‐quality genomes of small organisms like bacteria, viruses, and worms. – PacBio’s deal with Roche to develop technology for the diagnostic market Single Molecule, Real-Time (SMRT®) DNA Sequencing SMRT® bell SMRT® Cells PacBio® RS II Template Preparation Template Template Preparation Preparation Run Run Design Design Polymerase Polymeras eBinding Binding Instrument Instrument Run Run Primary Primary Analysis Analysis Secondary Secondary Analysis Analysis DNA Sample Fragment DNA Damage Repair/ End Repair Ligate adapters Purify DNA SMRTbell™ Template preparation can be used to create libraries of various insert sizes from 250 bp to 20,000 bp depending on the needs of the application. Tertiary Tertiary Analysis Analysis Advantages of SMRTbell™ Templates Key Advantages: • Structurally linear • Topologically circular • Provides sequences of both forward and reverse strands in the same trace Base Modification: Discover the Epigenome Directly observe base modifications using the kinetics of the polymerization reaction during normal sequencing Signal Processing and Base Calling Converting pulses of light into DNA bases and kinetic measures 43 Understanding Accuracy in SMRT® Sequencing • Single-pass error rate ~11% (predominantly deletions or insertions) • Single Molecule, Real-Time (SMRT®) DNA sequencing achieves highly accurate sequencing results, exceeding 99.999% (Q50) • How is this possible given that single-pass sequence has 1 mistake every 10 nucleotides • Single-pass errors are distributed randomly, which means that they wash out very rapidly upon building consensus. Sequencing 45 74 SMRT® Sequencing Accuracy Perspective: Understanding SMRT Sequencing Accuracy Data generated with P4-C2 chemistry on PacBio® RS II; Analyzed using Quiver with 2.0.1 SMRT® Analysis The PacBio® RS Helps Resolve Genetically Complex Problems Targeted Comprehensively Sequencing Characterize Genomic Variation Generate Finished De Novo Assembly Assemblies Base Modification Automatically detect Detection DNA base modifications 47 NGS time line Roche 454 Illumina SOLiD Ion torrent PacBio RS E. coli (5Mb) Arabidopsis thaliana (157 Mb) 200 K reads 30M reads 100M reads 120 bp 35 bp 35 bp 2005 2006 2007 2008 2009 2010 2011 49 454 Mb) NGS time line Illumina SOLiD Ion torrent PacBio RS Arabidopsis thaliana (157 Mb) ads 30M reads 100M reads p 35 bp 35 bp 2006 2007 2008 2009 2010 2011 50 2012 09 NGS time line Ion torrent HiSeq 4000 PacBio RS HiSeq X ten HiSeq2500 2010 2011 51 2012 PB Sequel 2013 2014 2015 2016 NGS Technology: conclusions 52 NGS Technology: conclusions 53 Summary 54 NGS terminology 55 NGS as a tool for studying Genome variation and regulation NGS APPLICATIONS 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 56 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ KU Leuven -­‐ UZ Leuven 57 DNA SEQUENCING WHOLE GENOME SEQUENCING 59 Copy Number Variations 60 Structural Variations 61 Whole genome sequencing ì Copy number variation analysis ì Sequencing a genome at 0.1-­‐0.3x ì Sequencing a genome at 1-­‐3x ì Structural variation analysis ì Sequencing a genome at 5-­‐10x ì Whole genome re-­‐sequencing ì Sequencing a genome at >30x ì yeast, fruit fly, bacterial genomes, human… 62 DNA SEQUENCING TARGETED RE-­‐SEQUENCING Sequencing -­‐ the beginning Random ??? genome sequencing 10/1/2015 ??? Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven Sanger sequencing • Targeted • 700-­‐100 0 bp Target enrichment strategies Random Hybrid genome Capture sequencing 10/1/2015 PCR based Sanger sequencing Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven Target enrichment strategies 10/1/2015 Jeroen Van Houdt -­‐ Genomics Core -­‐ UZ Leuven-­‐KU Leuven 67 Rapid expression profiling, transcriptome sequencing and small RNA’s RNA SEQUENCING RNA-­‐seq RNAseq: Gene Expression through sequencing ì Supports discovery, screening, and profiling ì Does not require prior gene knowledge or annotation ì Unique combination of Qualitative and quantitative measurement ì Digital counts vs analog intensities ì Increased dynamic range and sensitivity ì No probes or primers ì Any species -­‐ Even when reference genome not available ì Analyze gene expression RNAseq: summary ì Counting or Profiling ì ì Studying Alternative Splicing or quantifying cSNPs for most transcripts ì ì 10 million total reads of 35 bp length from poly-­‐A selected RNA will give performance better than any microarray Deeper profiling of 50 to 100 million reads, with read lengths of 50 to 100 bps, from poly-­‐A selected RNA using mRNA-­‐Seq assay Complete Annotation of an entirely New Transcriptome ì ì ì ~500 Million reads of 100 bp read length from multiple tissues Normalized stranded mRNA-­‐Seq & ncRNAs Small RNA-­‐Seq for microRNAs