The Power and Promise of Long-Read Sequencing Technologies

Long-read sequencing has transformed genomics, transcriptomics, and epigenomics by illuminating the dark side of the genome previously missed with short-read sequencing approaches. Highly repetitive regions like telomeres, centromeres, and transposable elements, alongside clinically relevant genetic variants or structural variations, are no longer cryptic and unreadable but can now be sequenced with unparalleled accuracy and efficiency (1).

The profound implications of long-read sequencing for large-scale population-level studies, clinical studies, drug development, and fundamental research are steadily being realized with an unprecedented number of discoveries, from the first complete telomere-to-telomere human genome (2) to exciting applications in single cells (3).

So, as long-read sequencing technologies are steadily coming of age, let’s take a look at what these innovations are and what they hold for the future.

Let’s dive in.

What Is Long-Read Sequencing?

While next-generation sequencing technologies based on short reads of between 50 to 300 base pairs have accelerated biological research immeasurably, these approaches largely rely on powerful algorithms to piece together complex jigsaws of reads based on sequence overlaps.

Accurate and robust mapping is possible where reads are unique, but this approach struggles with ambiguous repetitive regions or long, complex genomes, meaning some pieces of the puzzle (and all of the biological information they contain) are simply missed, leaving genomes or transcriptomes incomplete.

Enter long-read sequencing or so-called “third-generation sequencing.”

Long-read sequencing produces reads derived from single molecules that are thousands to hundreds of thousands of base pairs in length. This length allows researchers to map even the most troublesome regions of the genome with unparalleled accuracy owing to the increased uniqueness of each individual read. Unlike short-read technologies, long-read approaches don’t need extracted nucleic acids to be synthetically copied before sequencing, further reducing errors associated with excessive amplification.

The two most widely used long-read sequencing approaches, Single Molecule Real-Time HiFi sequencing from Pacific Biosciences and nanopore sequencing from Oxford Nanopore Technologies, differ in their sequencing principles, instrument implementation, and data outputs but were collectively named Method of the Year 2022 by Nature Methods (4).

Check out our blog series on spatial transcriptomics to learn about another revolutionary omics-based Method of the Year winner.

HiFi Sequencing from Pacific Biosciences

HiFi sequencing produces an average read length of around 20 kb with over 99.9% accuracy at base resolution. This accuracy is obtained by circular consensus sequencing, where circularized fragments of nucleic acids are first immobilized in millions of wells known as zero-mode waveguides. After immobilization, free-floating labeled nucleotides are added to allow DNA polymerase to race around the circularized molecule and create replicas of the immobilized nucleic acid. A specific pulse of light is released after each nucleotide is incorporated and picked up by detectors to determine the sequence of bases in a fragment. Simultaneous detection of differences in the speed of incorporation of bases can also determine if a nucleotide is methylated with no additional preparatory steps required.

Thanks to the DNA polymerase passing over the circularized molecule many times, the system can pinpoint the correct sequence of a molecule by cross-referencing each copy to improve the accuracy of base calls.

This accuracy was determined to be one of the most appealing benefits of HiFi sequencing for the All of Us study, which aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care (5).

Nanopore Sequencing from Oxford Nanopore Technologies

Oxford Nanopore Technologies takes a different approach to long-read sequencing overall.

Nanopore sequencing requires only a single molecule of DNA or RNA in its native form as input. Each molecule is enzymatically threaded through a hole one billionth of a meter in diameter called a nanopore. As each base passes through, it can be identified via the characteristic disruption it causes to an electrical current that flows across the nanopore. Variations in this electrical fluctuation are also sufficient to detect the methylation of nucleotides for long-read epigenomic studies. This system produces average read lengths of around 100 kb with over 99% accuracy but allows for essentially unlimited read lengths of over four million bases.

The flexibility in nanopore sequencing options, from plug-and-play instruments to high-throughput benchtop sequencers, means researchers can use nanopore sequencing in both remote environments and sequencing facilities alike.

The Promise of Long-Read Sequencing

While the distinct technological principles of HiFi and nanopore sequencing yield reads with varied lengths, error rates, and throughputs, researchers may find one technology better suits their research goals than the other or that combining both technologies may be the most powerful option.

For instance, the Telomere-to-Telomere (T2T) Consortium leveraged the complementary aspects of both technologies to unlock the hidden eight percent of the human genome missed by initial genome sequencing (2). With a total of 3.055 billion base pairs, this vast study enabled the assembly of highly repetitive centromeric satellite arrays, closely related segmental duplications, and provided gapless assemblies for all chromosomes except Y. HiFi reads and nanopore reads were used for mutual validation and uncovered almost 2000 novel gene predictions with potential implications for human health and disease (2).

Similarly, researchers have used both technologies to identify pathogenic mutations in human disorders such as Huntington’s disease or to rapidly diagnose rare genetic conditions and monitor infectious diseases to guide patient management (6).

In oncology, long-read sequencing is invaluable in identifying complex structural variants driving tumorigenesis or epigenetic changes modifying disease-associated biological functions (6). Long-read RNA sequencing has also identified countless novel isoforms of genes expressed associated with tumor progression or crucial to healthy developmental processes (6).

Long-read sequencing is also entering the single-cell age and is transforming single-cell multi-omics thanks to direct measurement of nucleotide sequences without algorithmic reconstructions. Studies are already uncovering the cellular diversity of tumors by using long-read sequencing methods to provide simultaneous information for both the phenotypes and genotypes of individual cells (3). Studies of single-cell-specific alternative splicing and fusion transcripts are also increasing at pace, driven by both technological and data analysis developments (7).

The future goals of long-read sequencing technologies will undoubtedly be to increase sample throughput and further improve accuracy, especially for single-cell applications, all while reducing costs for users (4). Both HiFi sequencing and nanopore sequencing have different throughputs, accuracies, and price points, but the best choice for researchers will ultimately depend on their biological question.

Driven by Data Analytics

The great strides made with long-read sequencing instruments could not have occurred without the simultaneous development of advanced data analytic pipelines crucial for accurate, reproducible, and biologically meaningful discoveries (8).

To generate complete telomere-to-telomere genome annotations, discover novel disease-associated single nucleotide variants, map multiple spliced transcripts, or discover and annotate novel transcripts, among countless other analyses, requires extensive up-to-date expertise often unavailable in-house.

At Nexco Analytics, we leverage our extensive knowledge of cutting-edge analysis approaches to ensure your long-read sequencing data is analyzed as efficiently and appropriately as possible. As experts in the analysis of the dark genome, we know how to illuminate the hidden gems within your data to help you truly get the most out of your long-read sequencing study.

Please get in touch with us to find out how we can help you with your next data analysis conundrum.

References

1. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nature Reviews Genetics. 2020 Oct;21(10):597–614.

2. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S. The complete sequence of a human genome. Science. 2022 Apr 1;376(6588):44–53.

3. Shiau 2023 Shiau CK, Lu L, Kieser R, Fukumura K, Pan T, Lin HY, Yang J, Tong EL, Lee G, Yan Y, Huse JT. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nature communications. 2023 Jul 11;14(1):4124.

4. Marx V. Method of the year: long-read sequencing. Nature Methods. 2023 Jan;20(1):6–11.

5. Mahmoud, M., Huang, Y., Garimella, K. et al. Utility of long-read sequencing for All of Us. Nature Communications. 2023 Jan 29;15(1):837.

6. Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Human genomics. 2023 Aug 8;17(1):73.

7. Shi ZX, Chen ZC, Zhong JY, Hu KH, Zheng YF, Chen Y, Xie SQ, Bo XC, Luo F, Tang C, Xiao CL. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nature Communications. 2023 May 6;14(1):2631.

8. Kovaka S, Ou S, Jenike KM, Schatz MC. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nature Methods. 2023 Jan;20(1):12–6.

Wednesday, Mar 27, 2024, 9:19 PM

long-reads, oxford-nanopore, genomics, dna-sequencing, genome-sequencing

Share this post