The Third Revolution in Sequencing Technology

Abstract

Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time.

A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster.

However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality.

Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology.

Here we review and compare the various long-read methods.

We discuss their applications and their respective strengths and weaknesses and provide future perspectives.

Keywords

next-generation sequencing

third-generation sequencing

long-read sequencing

single-molecule real-time sequencing

nanopore sequencing

synthetic long-read sequencing

Highlights

Long-read/third-generation sequencing technologies are causing a new revolution in genomics as they provide a way to study genomes, transcriptomes, and metagenomes at an unprecedented resolution.

SMRT and nanopore sequencing allow for the first time the direct study of different types of DNA base modifications.

Moreover, nanopore technology can sequence directly RNA and identify RNA base modifications.

Owing to the portability of the MinION and the existence of extremely simple library preparation methods, nanopore technology allows the performance of high-throughput sequencing for the first time in the field and at remote places. This is of tremendous importance for the survey of outbreaks in developing countries.

测序技术的第三次革命
摘要

40年前桑格测序的出现是革命性的,它第一次允许完整的基因组序列被破译。

第二次革命是新一代测序(NGS)技术的出现,它使基因组测序更加便宜和快速。

然而,NGS方法有几个缺点和缺陷,最明显的是它们的短read。
最近,第三代/长读的方法出现了,它可以产生前所未有的高质量 基因组组装。

此外,这些技术可以直接检测到天然DNA的表观遗传修饰,并允许不需要组装的全转录本测序。
这标志着测序技术的第三次革命。

在这里,我们回顾和比较各种长read的方法。

讨论了它们的应用和各自的优缺点,并提出了未来的展望。

关键字
新一代测序 第三代测序  读测序 单分子实时测序 纳米孔测序 合成测序读

重点
长读/第三代测序技术正在引起基因组学的一场新的革命,因为它们提供了一种方法,以前所未有的分辨率研究基因组、转录组和元基因组。

SMRT和纳米孔测序首次允许对不同类型的DNA碱基修饰进行直接研究。

此外,纳米孔技术可以直接对RNA进行测序,识别RNA碱基修饰。

由于MinION的便携性和极其简单的文库制备方法的存在,纳米孔技术首次在现场和偏远地区实现了高通量测序。
这对于调查发展中国家的疫情具有极其重要的意义。

The Advent of Third-Generation Sequencing (TGS)/Long-Read Sequencing

Shortly after the appearance of NGS, TGS technologies emerged. Distinguishing features of TGS are single-molecule sequencing (SMS) and sequencing in real time (as opposed to NGS, where sequencing is paused after each base incorporation) [11]. The first SMS technology, commercialized by Helicos Biosciences, resembled Illumina sequencing but without any bridge amplification [12]. As the method was relatively slow, expensive, and produced short reads (32 bp), it did not prove viable. The first ‘true’ TGS technology was released on the market in 2011 by Pacific Biosciences (PacBio) and is termed ‘single-molecule real-time’ (SMRT) sequencing [13]. More recently (2014), Oxford Nanopore Technologies (ONT) introduced nanopore sequencing [14]. Besides the absence of PCR amplification and the real-time sequencing process, an important feature of SMRT and nanopore sequencing is the production of long reads. As an alternative approach, Illumina introduced a library preparation kit for ‘synthetic long reads’ (SLRs) in 2014 (formerly Moleculo [15]). One year later 10X Genomics introduced a microfluidics variant of SLR with much higher partitioning capacity [16]. Note that SLR technologies are not TGS methods as they are based on classical Illumina sequencing. These long-read technologies are now revolutionizing genomics research as they enable researchers to explore genomes at an unprecedented resolution. In the subsequent sections we examine in more detail these new methodologies. Due to length limitations we do not discuss in detail the analysis of long-read sequence data. Excellent recent reviews focusing on long-read bioinformatics tools can be found elsewhere [17,18]. Long-Read Technologies SMRT Sequencing: PacBio In early 2011, PacBio released their PacBio RS sequencer, which uses SMRT technology (Box 1). While initially average read lengths were relatively short (1.5 kb) and average error rates were high (13%) [19], the technology has strongly improved over recent years. Average read lengths have increased more than tenfold and the throughput per run has increased by about 100-fold owing to the development of improved sequencing chemistries and the release of a new sequencer, the Sequel.
This machine generates about tenfold more sequence data than the upgraded PacBio RS (RSII) and is twofold less expensive (Table 1).
The ‘singlepass’ error rate has remained roughly the same since the beginning (13%), but molecules of up to 1–2 kb can now be sequenced many times owing to the circular templates [20] and increased polymerase processivity, strongly improving overall accuracy (see Figure ID in Box 1).
Moreover, increased throughput has led to a sharp reduction in cost per base ([19];
http://allseq/knowledge-bank/sequencing-platforms/pacific-biosciences/).
For genomic DNA library preparation, PacBio commercialized a ‘SMRTbell template prep kit’and an ‘express’ variant thereof for rapid library preparation with an approximately 3-h workflow. For transcriptome analysis an ‘isoform sequencing’ protocol is available (https://www.pacb/wp-content/uploads/ Procedure-Checklist-20-kb-Template-PreparationUsing-BluePippin-Size-Selection-System-15-20-kb-Cutoff-Sequel-Systems.pdf).

第三代测序(TGS)/长读测序的出现

在NGS出现后不久,TGS技术就出现了。
TGS的特点是单分子测序(SMS)和实时测序(与NGS不同,NGS在每个碱基掺入后暂停测序)。
第一项SMS技术由Helicos Biosciences公司商业化,类似于Illumina测序,但没有任何桥扩增[12]。
由于这种方法相对缓慢、昂贵,并且产生的读出短(32bp),因此被证明不可行。
2011年,太平洋生物科学公司(PacBio)在市场上发布了第一种“真正的”TGS技术,称为“单分子实时”(SMRT)[13]测序。
最近(2014年),牛津纳米孔技术公司(ONT)引进了[14]纳米孔测序技术。
除了没有PCR扩增和实时测序过程外,SMRT和纳米孔测序的一个重要特征是产生长序列。
作为一种替代方法,Illumina在2014年推出了一种用于“合成长读”(SLRs)的文库准备试剂盒(原Moleculo[15])。
一年后,10X Genomics引入了一个具有更高分割能力的SLR的微流体变体[16]。
请注意,SLR技术不是TGS方法,因为它们是基于经典的Illumina测序。
这些长期研究的技术现在正在彻底改变基因组研究,因为它们使研究人员能够以前所未有的分辨率探索基因组。
在接下来的章节中,我们将更详细地研究这些新方法。
由于长度的限制,我们不详细讨论长读取序列数据的分析。
最近关于长期阅读的生物信息学工具的优秀评论可以在其他地方找到[17,18]。
2011年初,PacBio发布了他们的PacBio RS测序仪,该测序仪使用了SMRT技术(Box 1),虽然最初的平均读取长度相对较短(1.5 kb),[19]的平均错误率较高(13%),但近年来该技术有了很大的进步。
平均读取长度增加了十倍以上,每次运行的吞吐量增加了大约100倍,这是由于改进测序化学的发展和一个新的测序器的发布,续。
这台机器生成的序列数据比升级后的PacBio RS (RSII)多十倍,而且便宜了两倍(表1)。
“单通”的错误率从一开始就大体保持不变(13%),但是由于圆形模板[20]和聚合酶加工能力的提高,现在可以多次测序1 - 2 kb的分子,极大地提高了整体准确性(见方框1中的图ID)。
此外,产量的增加导致了每个碱基成本的急剧下降([19];
http://allseq/knowledge-bank/sequencing-platforms/pacific-biosciences/)。
对于基因组DNA文库制备,PacBio商业化了一种“SMRTbell模板准备试剂盒”及其“表达”变体,用于大约3-h的快速文库制备。
对于转录组分析,可以使用“isoform测序”协议(https://www.pacb/wp-content/uploads/procedurechecklist-20-kb-template-prepareationusing-bluepippin - size - selec-system15-20-kb-cutoff-sequel-systems.pdf)。

 

Concluding Remarks and Future Perspectives
Over recent years, long-read sequencing methods have strongly improved.

These technologies now enable the study of genomes and transcriptomes at an unprecedented resolution.
Also,metagenomics analyses benefit from long-read sequencing, which allows for the first time the resolution of microbial communities at the species level [68–70].
Long-read sequencing is likely to become a standard medical diagnostic tool in the near future, as exemplified by a recent SMRT sequencing study of a patient’s genome revealing a SV that could not be detected despite extensive genetic testing with other methods [71].
In particular, nanopore sequencing has improved rapidly.
A theoretical 1 coverage of the Escherichia coli genome was obtained with just seven ultralong reads (http://lab.loman/2017/03/09/ultrareads-for-nanopore/) and
a human genome has been assembled using nanopore reads alone [24].
Ultralong nanopore reads may allow complete, gapless assembly of human genomes in the near future, which will further boost human genetics research and personalized medicine.

The portability of the MinION allows for the first time sequencing in the field, which is of great importance for the survey of outbreaks in developing countries [72,73].
However, there remains room for improvement. A weakness of nanopore sequencing is the high error rate.
 In 2010, Stoddart et al. proposed the development of nanopores with multiple recognition points for DNA sequence determination [74].

 This would provide a proofreading mechanism improving the overall quality of sequencing.
 As an alternative solution to reduce error rates, a method resembling PacBio CCS has been proposed [26].

On the other hand, to keep up with nanopore technology it will be important for PacBio to increase overall read length and throughput.

Current loading methods depend on passive diffusion and are biased towards shorter fragments.
A novel, voltage-induced loading technique increases the efficiency of loading long DNA molecules [75].

However, it seems unlikely that SMRT sequencing will approach the ultralong reads currently obtained with nanopores, due to the limitation of polymerase processivity.

 Thus, SMRT, nanopore, and SLR sequencing methods each have their particular strengths and weaknesses (Table 2), and depending on the specific application
either one technology or another may be preferred.
It is worth mentioning here that various other companies are also investing in novel methods for rapid, cost-effective, and portable sequencing and it will be interesting to see whether any of these technologies will see light in the near future (see Outstanding Questions).

Last, an exciting possibility of nanopore technology is the sequencing of denatured peptide chains, and recent results confirm its feasibility [76].
 It will be interesting to see whether further progress will be made in the future to make single-molecule protein sequencing a reality.
In any case, we are at only the beginning of the third revolution in sequencing technology and the coming years promise to bring exciting new developments and discoveries.

结束语和未来展望
近年来,长读测序方法有了很大的改进。

这些技术现在使基因组和转录组的研究以前所未有的分辨率成为可能。
此外,宏基因组学分析得益于长时间测序,这首次允许在物种水平上解析微生物群落[68-70]。
在不久的将来,长读测序可能成为一种标准的医学诊断工具,例如,最近一项针对患者基因组的SMRT测序研究显示,尽管使用其他方法进行了大量的基因检测,但仍无法检测到SV[71]。
特别是纳米孔测序技术得到了迅速发展。
理论1 覆盖的大肠杆菌基因组得到只有七超长(http://lab.loman/2017/03/09/ultrareads-for-nanopore/)和读取
利用纳米孔读取[24]已组装出人类基因组。
在不久的将来,超长纳米孔可能会实现人类基因组的完全无组织组装,这将进一步推动人类遗传学研究和个性化医疗。

MinION的便携性允许首次在现场进行排序,这对调查发展中国家的疫情非常重要[72,73]。
然而,仍有改进的余地。
纳米孔测序的一个缺点是错误率高。
2010年,Stoddart等人提出开发具有多个识别点的纳米孔用于DNA序列测定[74]。

这将提供一种校对机制,提高测序的整体质量。
作为降低错误率的替代方案,一种类似PacBio CCS的方法已经被提出。

另一方面,为了跟上纳米孔技术的发展,PacBio必须提高整体读取长度和吞吐量。

目前的加载方法依赖于被动扩散,偏向于较短的碎片。
一种新颖的电压诱导加载技术提高了加载长DNA分子的效率[75]。

然而,由于聚合酶加工能力的限制,SMRT测序似乎不太可能达到目前用纳米孔获得的超长序列。

因此,SMRT、nanopore和SLR测序方法各有各自的优缺点(表2),具体应用也有所不同
一种技术或另一种技术都可能是首选。
值得一提的是,其他许多公司也在投资快速、划算、便携测序的新方法,看看这些技术是否会在不久的将来崭露头角,这将是一件很有趣的事情。

最后,纳米孔技术的一个令人兴奋的可能性是对变性肽链进行测序,最近的结果证实了其可行性[76]。
未来是否会取得进一步的进展,使单分子蛋白质测序成为现实,这将是一件有趣的事情。
无论如何,测序技术的第三次革命才刚刚开始,未来几年有望带来令人兴奋的新发展和发现。

更多推荐

The Third Revolution in Sequencing Technology