Establishment of molecular markers in the whole genome of Ganoderma lingzhi elite varieties

LIU Yiting, JIANG Xiaohan, YANG Chunyan, CHEN Jianhui, WANG Chezhao, LÜ Xiaomeng, YANG Zhikang, DENG Youjin, WU Xiaoping

Mycosystema ›› 2025, Vol. 44 ›› Issue (3) : 240141.

PDF(1014 KB)
PDF(1014 KB)
Mycosystema ›› 2025, Vol. 44 ›› Issue (3) : 240141. DOI: 10.13346/j.mycosystema.240141
Research papers

Establishment of molecular markers in the whole genome of Ganoderma lingzhi elite varieties

Author information +
History +

Abstract

Ganoderma lingzhi 13 is an excellent cultivated variety with superior agronomic character. The mononucleate strain, G. lingzhi 13-5, as assembled into a complete genome by three-generation HiFi sequencing and Hi-C sequencing, and five replicated bi-nucleate strains were subjected to high throughput sequencing by Illumina, and then the genome of G. lingzhi 13-5 was used as a reference for SNP calling to establish the molecular markers based on the whole genome of G. lingzhi 13. The genome contains 13 chromosomes with a total of 45.73 Mb. After the SNP calling of five replicate strains were de-emphasized and merged, 319 074 shared heterokaryotic allelic difference sites were obtained as a molecular marker library for the identification of G. lingzhi 13 strains. Seven post-fruiting strains were randomly selected from the strain library. It was found that the percentage of allelic difference sites of the post-fruiting strains shared with G. lingzhi 13 ranged from 22.20% to 45.18%, being far inferior to threshold value, indicating that these strains were different from G. lingzhi 13 and they were not the same strains. In this study, molecular markers based on the whole genome of G. lingzhi 13 and its heterokaryotic allelic differentiation sites were established, which can accurately differentiate G. lingzhi 13, a superior cultivated variety.

Key words

three generations of HiFi sequencing / SNP calling / variety identification / variety protection

Cite this article

Download Citations
LIU Yiting, JIANG Xiaohan, YANG Chunyan, CHEN Jianhui, WANG Chezhao, LÜ Xiaomeng, YANG Zhikang, DENG Youjin, WU Xiaoping. Establishment of molecular markers in the whole genome of Ganoderma lingzhi elite varieties[J]. Mycosystema, 2025, 44(3): 240141 https://doi.org/10.13346/j.mycosystema.240141
灵芝Ganoderma lingzhi Sheng H. Wu, Y. Cao & Y.C. Dai俗称赤芝,是我国著名的药食同源经济真菌(才晓玲等 2016;戴玉成等 2021)。研究表明,目前已从灵芝中分离出400多种活性物质(Tan et al. 2018),灵芝多糖、氨基酸、多肽和三萜等为主要有效成分,赋予了灵芝多样的药理作用,如扶正固本、抗炎、抗癌、免疫调节、控制血糖、抗氧化、降血脂、延缓衰老等作用(Dan et al. 2016;Soccol et al. 2016;Wasser 2017;Chen et al. 2018;戚爱华等 2018)。
但野生灵芝资源数量有限,远不能满足市场日益增长的需求,现阶段我国人工栽培的灵芝菌株主要是从韩国(如韩芝)和日本(如日芝)引进,随着多年的连续性种植,菌株已产生一定程度的退化,导致灵芝整体质量及产量下降(左洪波等 2021)。因此灵芝的育种工作具有紧迫性,然而由于新品种权未能得到较好识别和保护,阻碍了灵芝菌株的自主研发进程。
知识产权是对创新成果最有效的制度保障,随着经济的快速发展,各国对知识产权越来越重视(孙缘缘 2019)。随着食用菌产业的不断发展,越来越多的新菌株被选育出来,食用菌知识产权的保护也逐渐受到重视。日本是较早将食用菌列为保护名单的国家(李绩 2007),我国对食用菌品种保护起步较晚(张清洋等 2020),受到保护的种属至今只有15个,仅占总的农业保护菌株的5% (刘晓柳等 2022)。目前还存在对新品种保护不充分,育种者权益不足等问题。但通过不断的法律修订和制度完善,正在逐步加强对育种者权益的保护,激励种业的创新发展。
灵芝13是筛选出的农艺性状较好、子实体产量较高、孢子粉产量较高的优良杂交子。为了有效保护育种者的权益,同时为新品种认定提供技术支持,本研究通过对新菌株灵芝13的单核菌株进行三代测序,组装出完整的基因组作为参考基因组,对灵芝13的5个重复双核菌株的重测序数据进行SNP calling后,构建出基于灵芝13全基因组的分子标记库,并随机抽取7个灵芝菌株对该方法进行验证。为后续灵芝13进行新品种认定及品种权保护提供技术与理论支持。

1 材料与方法

1.1 材料

1.1.1 实验菌株

灵芝102与119(83)通过双-单杂交得到杂交子灵芝13。

1.1.2 培养基

麸皮培养基(1 000 mL):马铃薯200 g,麸皮20 g,加入适量水与马铃薯一同煮沸后捞出过滤,定容至1 000 mL。再次煮沸后加入20 g琼脂至其完全煮化,关火加入20 g葡萄糖,溶解后进行分装,121 ℃灭菌30 min。

1.2 原生质体单核化及鉴定

将活化好的菌株转接至装有碎玻璃碴的麸皮液体培养基中,30 ℃、150 r/min摇床培养3-4 d。新鲜液体菌丝倒入50 mL离心管中,封口膜封口,4 500 r/min离心20 min,在超净工作台里去除上清液。用30 mL 0.7 mol/L KCl、10 mmol/L CaCl2·H2O缓冲液冲洗,4 500 r/min离心10 min,弃上清。将菌丝置于盛有10 mL酶液(0.2 g融壁酶、0.7 mol/L KCl、10 mmol/L CaCl2·H2O 缓冲液,滤头过滤)的50 mL离心管中,28 ℃、150 r/min振荡酶解3 h。吸取微量菌液镜检,有原生质体后取出酶液,加少许0.7 mol/L KCl、10 mmol/L CaCl2·H2O缓冲液摇晃,倒于3层灭菌纸过滤,再用0.7 mol/L KCl、10 mmol/L CaCl2·H2O缓冲液冲洗1-2次,去残渣,收集滤液于50 mL离心管中,4 ℃、 2 500 r/min离心10 min,弃上清。用适量 0.7 mol/L KCl、10 mmol/L CaCl2·H2O缓冲液冲洗,将沉淀混匀后转移至2 mL离心管中,4 ℃、2 500 r/min离心10 min,弃上清。加300 μL 1×STC悬浮,计数,使原生质体浓度达到108个/mL,涂布到90 mm再生培养基培养皿中,30 ℃培养2-3 d,挑取菌落小、生长速度较慢的单菌落至麸皮平板中。待长出菌丝后,在显微镜下观察是否存在锁状联合结构,若无锁状联合结构,说明原生质体单核化成功。

1.3 样品收集及基因组测序

将活化好的灵芝13二倍体菌株及通过原生质体单核化得到的单倍体菌株灵芝13-5转接至贴有玻璃纸的90 mm麸皮平板中,30 ℃黑暗培养至菌丝即将长满皿,收集菌丝至少0.3 g,-80 ℃保存。单倍体菌株全基因组三代HiFi测序及Hi-C测序委托安诺优达公司完成,三代HiFi测序深度为100×,数据量为10 G,Hi-C测序深度为100×,数据量为20 G。二倍体菌株同时转接5个培养皿,分别收集菌丝后作为5个重复,委托北京诺禾致源生物信息科技有限公司进行Illumina Hiseq双端测序,测序深度为100×,数据量为4 G。

1.4 基因组组装

使用bamtools (Barnett et al. 2011)中的bam2fastq子命令从灵芝13-5菌株的原始测序数据bam文件中提取fastq序列。使用HiFiasm软件(Cheng et al. 2021)对提取获得的高准度HiFi reads进行组装,设置Hi-C模式,其余参数默认。在Hi-C模式下,HiFiasm会进行all-vs-all比对,以找出它们之间的重叠区域。当reads之间存在碱基差异时,HiFiasm会根据支持该差异的reads数量来决定是否将其视为SNP (单核苷酸多态性)。如果有3个或更多的reads支持这一差异,HiFiasm将其视为杂合变异并保留为SNP。如果支持的reads数量不足3个,则认为这是一个测序错误,并将其纠正。SNP信息会被HiFiasm用于定相。
使用Editplus程序打开组装后的序列文件,根据端粒DNA的特征人工统计灵芝13-5的端粒序列。如果一条contig两端都含有端粒序列,说明该contig是一条完整的,没有gap的完整染色体。通过统计所有contig中的端粒数量,再除以2,初步认为是该基因组的染色体数量。
对使用HiFiasm软件(Cheng et al. 2021)组装得到的contigs进行自身Blastn比对,识别出长度在10 kb左右的由串联重复片段组成的contigs,从中截取出串联重复单元;将其与NCBI中的nt库进行比对,根据rDNA序列的特征判断其是否为rDNA重复单元。将识别出的rDNA重复单元再次与基因组中的contigs进行比对,将一端携带rDNA序列的2个contigs直接连接起来,形成一个结构为“序列-rDNA区域-序列”的scaffold。通过HiFi reads的测序深度和基因组单拷贝区域的测序深度计算出rDNA区域重复单元的串联个数。使用Hi-C Pro (Servant et al. 2015)工具检测剩余长度大于100 kb的长片段contigs之间的连接关系,并将认为应该相连的contigs连接起来形成scaffolds。最后利用HiFi reads来填平scaffolds中的缺口。

1.5 基因组组装完整性和准确性验证

通过基因组组装质量评估软件BUSCO v5.6.1 (Seppey et al. 2019)对灵芝13-5基因组组装完整性进行评估,采用担子菌数据库第十版(basidiomycota_odb10.2020-09-10)。染色体内和染色体之间的互作都遵循一定的规律:同一条染色体内的2个基因,距离越近其互作频率越高;不同染色体之间的2个基因的互作频率一定低于同一染色体内的互作频率。根据该性质使用Hi-C Pro软件(Servant et al. 2015)获得染色体互作热点图,验证灵芝13-5染色体组装框架的准确性。

1.6 着丝粒分析

大多数真菌细胞的染色体着丝粒具有互作信号强的特点,如酵母菌(Varoquaux et al. 2015)、小麦契诃夫氏锈菌(Sperschneider et al. 2021)等。使用Hi-C Pro软件,获得染色体互作热点图,识别出基因组中每个40 kb Bin之间的Hi-C互作信号,每条染色体互作信号最强的位置即为着丝粒所在的位置。

1.7 基因预测和分析

使用Funannotate软件(https://funannotate. readthedocs.io/en/latest/)对灵芝13-5基因组进行注释,用灵芝的蛋白质组作为蛋白证据,设置Augustus参数模型为灰盖鬼伞。将灵芝13-5的基因组序列分割成大小为100 kb的窗口,统计每个窗口含有的基因数目,比较每条染色体的基因密度分布情况。

1.8 重复序列分析

使用包括RepeatMasker (Chen 2004)、tandem Repeat Finder (Benson 1999)及TE classify软件对灵芝13-5的基因组的重复序列进行识别查找和分类。将灵芝13的基因组序列分割成大小为100 kb的窗口,统计每个窗口含有多少重复序列,比较每条染色体中重复序列的分布情况。

1.9 重测序数据处理及全基因组分子标记建立

组装好的灵芝13-5基因组进行质控,去除重复序列后作为参考基因组,使用GATK软件给参考基因组建立索引,将5个重复菌株的二代重测序数据比对到参考基因组,生成SAM文件;排序并生成BAM文件,并对其进行PCR重复标记,生成MarkDup.bam;对MarkDup.bam建立索引后找到SNP差异位点,进行SNP calling后生成gVCF文件;最后合并5个重复菌株的gVCFs,生成VCF文件。在此基础上,筛选出<10 bp、双等位基因的indel,使用GATK软件进行过滤。将最终得到的SNP结果去重合并后构建成分子标记库。
以稳定出现在5个重复菌株中的等位差异位点作为基准,计算出每个重复菌株与基准相差的等位差异位点数,设定基准/与基准相差最多的重复菌株的等位差异位点数的比值作为判断阈值。若两菌株间的GS≤阈值,则认定为“不同品种”;若两菌株间的GS>阈值时,则认定为“相同品种”(GS=待测菌株与灵芝13共有的等位差异位点数/全基因组分子标记库的等位差异位点数)。

1.10 全基因组分子标记验证

从菌种库中随机抽取7个已出过菇的灵芝菌株进行二代重测序,使用参考基因组(灵芝13-5)对7个菌株的重测序数据进行SNP calling后,将待测菌株SNP位点与灵芝13异核等位差异位点一致的数量作为分子,分子标记库中等位差异位点的总数作为分母,计算出比值。验证基于全基因组构建的分子标记库的可行性。

2 结果与分析

2.1 基因组组装结果

通过PacBio测序平台对通过原生质体单核化得到的单倍体菌株灵芝13-5进行基因组测序。灵芝13-5菌株测序获得62.91万个HiFi reads,其N50长度为19 179 bp,GC含量为53%,共获得11.54 G测序数据。
使用HiFiasm软件对灵芝13-5的HiFi reads进行组装,模式设置为Hi-C模式,并挂载灵芝13-5的Hi-C数据,组装获得基因组数据。基因组含有789个contigs,组装大小总共为77.33 Mb (表1)。在这些contigs中,tig01、tig02、tig03、tig04、tig05、tig06、tig07、tig08、tig09、tig11、tig12共11条序列的两端含有端粒序列,因此认为它们是完整的染色体;tig10、tig13、tig14共3条序列,其中一端带有端粒序列。
Table 1 Genomic information of Ganoderma lingzhi 13-5

表1 灵芝13-5基因组信息

拼接后的
序列
Contig
大小
Size
(Mb)
5ʹ端端粒
Telomere at
5ʹ end
3ʹ端端粒
Telomere at
3ʹ end
完整染色体
Complete
chromosome
注释
Notes
tig01 4.76 + + +
tig02 3.34 + + +
tig03 4.14 + + +
tig04 3.14 + + +
tig05 4.55 + + +
tig06 3.41 + + +
tig07 2.60 + + +
tig08 3.19 + + +
tig09 3.39 + + +
tig10 2.53 + 在另一端有连接的rDNA区域
Linked rDNA region on the other end
tig11 5.31 + + +
tig12 2.50 + + +
tig13 3.32 +
tig14 0.30 +
Others 32.67 共775个小序列,114个属于rDNA区域,500个属于
线粒体基因组
Small contigs totaled 775, including 114 belonging to rDNA
region and 500 belonging to mitochondrial genome
注:+表示有端粒,是完整的染色体
Note: + indicates the presence of telomeres and is the complete chromosome.
每个基因组只含有一个rDNA区,通过Blastn比对后发现tig10的3′端携带有rDNA区域的序列,将tig10的5′端序列与灵芝13的HiFi reads库进行比对,将其延长7 168 bp,并携带端粒。rDNA的重复单元为10 150 bp,包含完整的18S rRNA 基因、ITS1序列、5.8S rRNA 基因、ITS2序列和25S rRNA基因。使用HiC-Pro对剩余2个长片段序列进行分析,发现tig13的5′末端与tig14反向互补序列的3′末端有一段21 486 bp的重叠区域,这一连接获得HiFi长片段reads支持。
从基因组的交联信号热图可以看出,其任何一条染色体内部的交联信号不存在明显的断点(图1A),说明基因组的框架组装是正确的。使用软件BUSCO对灵芝13-5基因组组装完整性进行评估。在担子菌1 764个直系同源物中,发现hap1完整同源基因为1 675个(图1B,94.9%),其中单拷贝同源基因为1 664个(94.3%),多拷贝同源基因为11个(0.6%),这一结果证明了hap1基因组组装的高准确性和完整性。
Fig. 1 Evaluation results of genome completeness and accuracy of Ganoderma lingzhi 13-5. A: Heatmap of the Hi-C crosslinking. The color in the figure increases with the intensity of interaction; The coordinate and ordinate indicate the N * bin position on the genome; B: Results of the BUSCO assessment. C: Intact orthologous genes; S: Single copy orthologous genes; D: Multiple copy ortholog genes; F: Fragmented genes; M: Missing genes.

图1 灵芝13-5基因组完整性和准确性评估结果 A:Hi-C交联热图;图中颜色从浅到深表示互作程度的增加,颜色越深互作越强;横坐标及纵坐标表示其在基因组上的N * bin位置;B:BUSCO评估结果;C:完整同源基因;S:单拷贝同源基因;D:多拷贝同源基因;F:片段基因;M:缺失的基因

Full size|PPT slide

2.2 着丝粒的特征

Hi-C分析发现(图1A),灵芝13-5基因组中各条染色体的着丝粒之间具有较强的交联信号。根据此特征,找出基因组中各条染色体着丝粒的具体位置(图2):Chr04、Chr11、Chr12和Chr13 4条染色体的着丝粒接近端粒区;除Chr05的着丝粒位于染色体中间区域,其余8条染色体的着丝粒均偏向染色体一端。
Fig. 2 Location information of centromere of each chromosome in the genome. Abscissa: Length of each chromosome; Ordinate: Hi-C interaction value, higher value indicates stronger cross-link signal.

图2 基因组各条染色体着丝粒位置信息 横坐标:每条染色体的长度;纵坐标:Hi-C互作值,数值越高表示交联信号越强

Full size|PPT slide

高通量染色体三维构象捕获技术无法确定染色体着丝粒的起始和终止位置。为了分析着丝粒的共有特征,截取着丝粒交联信号最强的40 kb bin及其前后各40 kb序列。13条染色体 着丝粒的重复序列所占比例差异较大,介于29.50%-91.55%之间;除Chr04和Chr09外,其余染色体着丝粒重复序列比例均超过50% (图3)。Chr06着丝粒包含重复序列基序最少,为51个,Chr01最多,高达108个(表2)。灵芝13-5基因组中共预测到272个重复序列rnd-1_family-0家族,其中116个落在着丝粒区域中,占比42.65%。rnd-1_family-0、rnd-3_family-15、rnd-1_family-8家族除了在Chr04着丝粒中不存在,成簇坐落于其他每一个着丝粒中,数量为1-17个。此外,携带有rnd-1_family-2家族的着丝粒有11个;携带rnd-4_family-1322家族的着丝粒有10个;携带有rnd-1_family-20、rnd-4_family-276家族的着丝粒均有9个;携带有rnd-1_family-3、GA-rich、rnd-1_family-146、rnd-1_family-1家族的着丝粒均有8个;携带有(CCTCAT)n家族的着丝粒均有7个。
Fig. 3 Proportion of centromere repeats.

图3 染色体着丝粒重复序列占比

Full size|PPT slide

Table 2 Centromere characteristics of each chromosome in the genome

表2 基因组各条染色体着丝粒特征

重复序列家族
The family
of repeat
sequences
1号
染色

C01
2号
染色

C02
3号
染色

C03
4号
染色

C04
5号
染色

C05
6号
染色

C06
7号
染色

C07
8号
染色

C08
9号
染色

C09
10号
染色

C10
11号
染色

C11
12号
染色

C12
13号
染色

C13
统计
Total
rnd-1_family-0 13 11 14 0 13 8 8 14 6 8 3 7 11 116
rnd-3_family-15 14 9 8 0 10 1 9 17 6 6 8 10 13 111
rnd-1_family-8 17 3 7 0 9 2 2 8 10 2 4 14 7 85
rnd-4_family-276 5 0 7 0 0 0 7 6 3 6 2 6 6 48
rnd-1_family-2 16 4 2 0 1 3 2 3 0 3 3 2 7 46
rnd-4_family-1322 3 0 2 0 1 0 4 4 1 2 2 3 3 25
rnd-1_family-20 3 1 2 0 0 0 1 3 2 3 4 1 0 20
rnd-1_family-3 3 1 3 0 1 0 0 3 0 1 2 0 3 17
rnd-1_family-146 1 1 1 0 1 1 0 0 4 0 0 2 3 14
rnd-1_family-1 0 2 1 0 0 1 2 2 3 0 0 1 1 13
(CCTCAT)n 2 3 0 0 1 2 0 2 0 1 0 2 0 13
GA-rich 1 1 1 1 2 0 0 0 0 0 3 1 1 11
Total number 108 60 57 65 60 51 58 81 62 71 81 70 74 898
Percent (%) 2.07 2.01 2.10 0.87 2.39 1.77 2.58 2.58 1.71 2.55 2.51 2.84 3.07
注:统计每条染色体着丝粒区域120 kb序列信息;The family of repeat sequences:显示同时出现在6条染色体着丝粒区域的family;Total number:统计120 kb区域的family总数;Percent:着丝粒重复序列长度占对应染色体长度比值
Note: 120 kb sequence information of centromere region of each chromosome was collected. The family of repeat sequences: Family that appear simultaneously in the centromere region of 6 chromosomes. Total number: The total number of family in the 120 kb area. Percent: The ratio of centromere repeat sequence length to corresponding chromosome length.

2.3 基因预测

基因组共预测13 383个基因,其中蛋白表达基因13 179个,tRNA基因204个。蛋白表达基因平均长度1 652.64 bp,平均CDS长度1 323.79 bp,内含子长度为327.61;平均每个基因包含4.81个外显子,每个外显子平均长度274.84 bp (表3)。
Table 3 Results of genome prediction

表3 基因预测结果

基因预测结果
Results of genome prediction
基因组大小
Genome size (bp)
预测基因 Predicted genes 13 383
蛋白质编码基因 Protein conding genes 13 179
tRNA基因 tRNA genes 204
平均基因长度 Average gene length 1 652.64
平均CDS长度 Average CDS length 1 323.79
每个基因的平均外显子数
Average exons per gene (个)
4.81
平均外显子长度 Average exon length 274.84
平均内含子长度 Average intron length 327.61
基因组各条染色体携带基因密度见图4,Chr04每100 kb序列平均携带基因数量为 41.47个,基因密度最大;其余12条染色体的基因密度相差不大,介于30.90-37.07个,其中Chr02每100 kb序列平均携带基因数量为 30.90个,基因密度最小。
Fig. 4 Genome chromosome average gene density distribution map.

图4 基因组染色体平均基因密度分布图

Full size|PPT slide

2.4 重复序列

基因组中检测到14 213个重复序列片段,合计5.63 Mb,占基因组序列的12.32% (表4)。在重复序列中,含有反转座子4 297个,合计4.45 Mb,占比78.90%;含有DNA转座子1 149个,合计0.56 Mb,占比9.97%;含有串联重复序列7 881个,合计0.55 Mb,占比9.74%。反转座子Gypsy和LINE所占比例最大,分别为31.41%和12.72%。
Table 4 Information about genome repeat sequence

表4 基因组重复序列信息

分类
Classification
数量
Number
长度
Length (Mb)
占比
Per. (%)
重复序列片段 Total repeat fraction 14 213 5.63 12.32
分类I:反转录因子
Class I: Retroelement
统计 Total 4 297 4.45 9.72
LTR反转录转座子
LTR Retrotransposon
统计 Total 2 106 3.40 7.44
Ty1/Copia 276 0.58 1.28
Ty3/Gypsy 670 1.77 3.87
Others 1 160 1.05 2.29
非LTR反转录转座子
non-LTR Retrotransposon
统计 Total 1 283 0.75 1.64
LINE 1 039 0.72 1.57
SINE 244 0.03 0.07
未分类的反转座子 Unclassified retroelement 908 0.30 0.65
分类Ⅱ:DNA转座子
Class Ⅱ: DNA Transposon
统计 Total 1 149 0.56 1.23
TIR CMC 57 0.06 0.12
Tc1/Mariner 63 0.05 0.12
Others 966 0.40 0.87
串联重复序列 Tandem repeats 7 881 0.55 1.20
未知 Unkown 169 0.14 0.31
灵芝13-5基因组各条染色体重复序列含量(图5),Chr04重复序列占比最大,为23.89%,其次为Chr09,占比为17.62%;Chr03的重复序列占比最少,为7.60%;其余10条染色体的重复序列占比相差不大,在8.50%-12.47%之间。
Fig. 5 Distribution of chromosome repeat content in the genome.

图5 基因组染色体重复序列含量分布图

Full size|PPT slide

2.5 SNP calling结果分析

以单核化菌株灵芝13-5基因组(灵芝13的A核基因组)为参考基因组,对5个重复菌株的二代重测序数据进行SNP calling,去重合并后获得379 887个SNP位点,其中稳定出现在5个菌株中的SNP位点有319 074个(图6)。这些SNP位点以灵芝13的A核基因组为参考序列,采用异核菌株的重测序列分析获得,即为B核基因组与A核基因组存在差异的单碱基多态性位点,称为异核等位差异位点。我们将灵芝13的319 074个异核等位差异位点作为一个分子标记库,作为评判其他菌株与灵芝13是否为相同菌株的工具。
Fig. 6 Venn diagram between five repeating strains. LZ1-5: Duplicate strain 1-5.

图6 5个重复菌株之间的韦恩图 LZ1-5:重复菌株1-5

Full size|PPT slide

将稳定出现在5个菌株中的319 074个SNP位点作为基准,LZ1 (重复菌株1)有361 003个SNP位点,与基准相差41 929个SNP位点;LZ2 (重复菌株2)有347 865个SNP位点,与基准相差28 791个位点;LZ3 (重复菌株3)有354 173个SNP位点,与基准相差35 099个位点;LZ4 (重复菌株4)有355 723个SNP位点,与基准相差36 649个位点;LZ5 (重复菌株5)有332 000个SNP位点,与基准相差最少,为12 926个位点。设定与基准相差最多的41 929个SNP位点(41 929/361 003×100%≈11.61%),也就是以LZ1与基准共有的等位差异位点数作为判断阈值(319 074/361 003×100%≈88.39%),若两菌株间的GS≤88.39%,则认定为“不同品种”;若两菌株间的GS>88.39%时,则认定为“相同品种”。将5个菌株SNP calling的结果绘制成密度曲线图,以第一条染色体为例(图7),可以看出这5个菌株几乎完全一致,说明比较SNP密度曲线走势也是判断两菌株是否相同的方法之一。
Fig. 7 SNP calling results of the first chromosome of five repeating strains.

图7 5个重复菌株第一条染色体的SNP calling结果图

Full size|PPT slide

2.6 全基因组分子标记的验证

从实验室菌种库中随机抽取7个灵芝菌株并进行二代基因组测序,以单核化菌株灵芝13-5基因组为参考进行SNP calling,将获得的SNP位点信息逐一与灵芝13的异核等位差异位点库进行比对,计算出各个菌株与灵芝13共有等位差异位点数占库容比值(表5)。G.l0064与灵芝13的共有等位差异位点数最少,为70 834个,占比22.20%,日芝与灵芝13的共有等位差异位点数最多,为144 157个,占比45.18%,远低于同一品种(菌株)的阈值(88.39%),说明它们均不属于灵芝13这一菌种。这7个菌株的SNP密度曲线走势与灵芝13均具有较大差异(图8),也说明灵芝13与这7个菌株不是同一个菌株。
Table 5 Number of allelic difference sites shared between Ganoderma lingzhi 13 and seven post-fruiting strains to be tested

表5 灵芝13与7个待测菌株共有的等位差异位点数

待测菌株编号
The strain to
be tested
共有等位差异位点数
Number of shared allelic
difference loci
占比
Proportion
(%)
SL9 75 174 23.56
G.l0069 75 429 23.64
日芝 G. japonicum 144 157 45.18
G.l0051 76 450 23.96
G.l0064 70 834 22.20
G.l0081 76 865 24.09
杂交子12
Hybridons 12
89 500 28.05
Fig. 8 SNP calling results of the first chromosome of Ganoderma lingzhi 13 and seven post-fruiting strains.

图8 灵芝13与7个菌株第一条染色体的SNP calling结果图

Full size|PPT slide

3 讨论

本研究通过对优良单核菌株灵芝13-5进行三代HiFi测序,辅助Hi-C数据进行基因组组装。基因组大小为45.73 Mb,共包含13条染色体,大小在2.49-5.31 Mb之间。对组装好的基因组进一步分析发现,着丝粒可以分布在近端粒区,或染色体中间,但大多偏向染色体的一端。着丝粒中含有大量重复序列是其特征之一,13条染色体着丝粒中的重复序列占比介于29.50%-91.55%之间。整个基因组中共检测到14 213个重复序列片段,合计5.63 Mb,占基因组序列的12.32%。在重复序列中,含有反转座子4 297个(78.90%);含有DNA转座子1 149个(9.97%);含有串联重复序列7 881个(9.74%)。使用funannotate软件对基因组进行预测,共预测到13 383个基因,其中蛋白表达基因13 179个,tRNA基因204个。
基于组装好的灵芝13-5完整基因组序列,构建了全基因组分子标记。5个重复菌株通过SNP calling去重合并后,灵芝13中A核与B核不同的SNP位点(即异核等位差异位点)共有319 074个。将灵芝13的319 074个异核等位差异位点作为一个分子标记库,作为评判其他菌株与灵芝13是否为相同菌株的工具。将待测菌株与灵芝13共有的等位差异位点数作为分子,分子标记库中等位差异位点的总数作为分母,计算出比值。因此只要待测菌株有一个核与灵芝13的核来源不一致,两者共有的等位差异位点数就会很少。设定与基准相差最多的41 929个SNP位点(41 929/361 003×100%≈11.61%),即以LZ1与基准共有的等位差异位点数作为判断阈值(319 074/361 003×100%≈88.39%),若两菌株间的GS≤88.39%,则认定为“不同品种”;若两菌株间的GS>88.39%时,则认定为“相同品种”。
灵芝13整个基因组的大小为45.73 Mb,SNP calling出的319 074个位点(有差异的部分),仅占基因组的0.70%,即其他99.30%的区域都是相似的。因此从整个基因组来看,11.61%是在基因组0.70%的基础上有差异,实际上在整个基因组里有差异的部分仅占0.08%,菌株间的相似度是很高的。本研究在除去相似部分的基础上进行计算,因此以88.39%作为阈值是可靠的。
目前我国育种者的权益还不能很好地受到保护,严重打击了育种者的热情。基于全基因组建立的分子标记法在灵芝13进行新品种认定时可以提供理论支持,证明其是新品种;在后续遇到侵权问题时,也可将“侵权菌株”进行二代重测序,SNP calling后与分子标记库进行比对,说明二者是否为同一菌株(图9)。
Fig. 9 Construction and application of genome-wide molecular markers.

图9 全基因组分子标记法的构建及应用

Full size|PPT slide

作者贡献

刘逸婷:论文构思、数据处理与分析、实验、撰写;蒋晓涵:实验、审核与编辑写作;杨春艳:实验、审核与编辑写作;陈健辉:实验、数据处理;王车昭:实验、软件分析;吕晓萌:实验、审核与编辑写作;杨治康:实验;邓优锦:数据处理、软件分析、论文构思;吴小平:提供实验材料和菌种、论文构思。

利益冲突

作者声明,该研究不存在任何潜在利益冲突的商业或财务关系。

References

[1]
Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT, 2011. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 27(12): 1691-1692
Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research.We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit.BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.
[2]
Benson G, 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research, 27(2): 573-580
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.
[3]
Cai XL, He W, An FQ, 2016. Research progress on Ganoderma lucidum germplasm resources. Modern Agricultural Science and Technology, 2016(6): 99-100 (in Chinese)
[4]
Chen NS, 2004. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics, 5(1): 4-10
[5]
Chen TQ, Wu JG, Kan YJ, Yang C, Wu YB, Wu JZ, 2018. Antioxidant and hepatoprotective activities of crude polysaccharide extracts from Lingzhi or Reishi medicinal mushroom, Ganoderma lucidum (Agaricomycetes), by ultrasonic-circulating extraction. International Journal of Medicinal Mushrooms, 20(6): 581-593
[6]
Cheng HY, Concepcion GT, Feng XW, Zhang HW, Li H, 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods, 18(2): 170-175
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
[7]
Dai YC, Yang ZL, Cui BK, Wu G, Yuan HS, Zhou LW, He SH, Ge ZW, Wu F, Wei YL, Yuan Y, Si J, 2021. Diversity and systematics of the important macrofungi in Chinese forests. Mycosystema, 40: 770-805
Abstract

Macrofungi, as an important component in forest ecosystems, consist of most members of Basidiomycota and some members of Ascomycota, having important economical value and ecological functions. Extensive field investigations have been carried out in almost whole types of the Chinese forests during the past 30 years, and 112 000 specimens were collected. Based on morphological examination and phylogenetic analyses in combination with ecology and biogeography, 4 250 species belonging to 21 orders in Baidiomycota and Ascomycota were identified, including two new families, four new subfamilies, 69 new genera and 885 new species. Yunnan Province is the richest in macrofungal diversity among provinces or regions in China, and 314 new species were described from this province, accounting for 35% of all the new species described from China by the authors. Our studies have made contributions to deepening the understanding of global diversity of macrofungi. The names of some important Chinese medicinal fungi were revised, the diversity characteristics of Chinese poisonous mushrooms were revealed, and the pathogenetic wood-decaying species were ascertained. These data improved our knowledge on utilization of natural resources and protection of forest health. Based on molecular evidences, the origin of some forest representative fungal genera or species complex were deduced, and their dispersal and speciation were discussed, for the purposes of providing some data for evolutionary study at level of family, order or class of macrofungi henceforth.

[8]
Dan XL, Liu WL, Wong J, Tzi BN, 2016. A ribonuclease isolated from wild Ganoderma lucidum suppressed autophagy and triggered apoptosis in colorectal cancer cells. Frontiers in Pharmacology, 7: 217
[9]
Li J, 2007. Analysis of intellectual property protection and current situation of edible fungi in China. Chinese Inventions and Patents, 2007(8): 48-49 (in Chinese)
[10]
Liu XL, Zhong C, Xie J, Hou FF, Dai JM, Zhang SH, Jin J, 2022. Development status and perspective of new varieties protection and DUS test guide for medicinal and edible fungi. Chinese Traditional and Herbal Drugs, 53(4): 1173-1180 (in Chinese)
[11]
Qi AH, Sun YX, Li W, Che XL, Lian F, 2018. The medicinal value of Ganoderma lucidum. Rural Economy and Science and Technology, 29(10): 141-142 (in Chinese)
[12]
Seppey M, Manni M, Zdobnov EM, 2019. BUSCO: assessing genome assembly and annotation completeness. Methods in Molecular Biology, 1962: 227-245
Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.
[13]
Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, Heard E, Dekker J, Barillot E, 2015. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology, 16: 259
HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro.
[14]
Soccol CR, Bissoqui LY, Rodrigues C, Rubel R, Sella SRBR, Leifa F, de Souza VLP, Soccol VT, 2016. Pharmacological properties of biocompounds from spores of the Lingzhi or Reishi medicinal mushroom Ganoderma lucidum (Agaricomycetes): a review. International Journal of Medicinal Mushrooms, 18(9): 757-767
[15]
Sperschneider J, Jones AW, Nasim J, Xu B, Jacques S, Zhong CC, Upadhyaya NM, Mago R, Hu YH, Figueroa M, Singh KB, Stone EA, Schwessinger B, Wang MB, Taylor JM, Dodds PN, 2021. The stem rust fungus Puccinia graminis f. sp. tritici induces centromeric small RNAs during late infection that are associated with genome-wide DNA methylation. BMC Biology, 19(1): 203
Silencing of transposable elements (TEs) is essential for maintaining genome stability. Plants use small RNAs (sRNAs) to direct DNA methylation to TEs (RNA-directed DNA methylation; RdDM). Similar mechanisms of epigenetic silencing in the fungal kingdom have remained elusive.We use sRNA sequencing and methylation data to gain insight into epigenetics in the dikaryotic fungus Puccinia graminis f. sp. tritici (Pgt), which causes the devastating stem rust disease on wheat. We use Hi-C data to define the Pgt centromeres and show that they are repeat-rich regions (~250 kb) that are highly diverse in sequence between haplotypes and, like in plants, are enriched for young TEs. DNA cytosine methylation is particularly active at centromeres but also associated with genome-wide control of young TE insertions. Strikingly, over 90% of Pgt sRNAs and several RNAi genes are differentially expressed during infection. Pgt induces waves of functionally diversified sRNAs during infection. The early wave sRNAs are predominantly 21 nts with a 5' uracil derived from genes. In contrast, the late wave sRNAs are mainly 22-nt sRNAs with a 5' adenine and are strongly induced from centromeric regions. TEs that overlap with late wave sRNAs are more likely to be methylated, both inside and outside the centromeres, and methylated TEs exhibit a silencing effect on nearby genes.We conclude that rust fungi use an epigenetic silencing pathway that might have similarity with RdDM in plants. The Pgt RNAi machinery and sRNAs are under tight temporal control throughout infection and might ensure genome stability during sporulation.© 2021. The Author(s).
[16]
Sun YY, 2019. On China's independent intellectual property rights protection along the Belt and Road line. Guangxi Quality Supervision Guide, 2019(3): 207-208 (in Chinese)
[17]
Tan XY, Sun JS, Ning HJ, Qin ZF, Miao YX, Sun T, Zhang XQ, 2018. De novo transcriptome sequencing and comprehensive analysis of the heat stress response genes in the basidiomycetes fungus Ganoderma lucidum. Gene, 661: 139-151
[18]
Varoquaux N, Liachko I, Ay F, Burton JN, Shendure J, Dunham MJ, Vert JP, Noble WS, 2015. Accurate identification of centromere locations in yeast genomes using Hi-C. Nucleic Acids Research, 43(11): 5331-5339
Centromeres are essential for proper chromosome segregation. Despite extensive research, centromere locations in yeast genomes remain difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation capture assay, Hi-C, has been re-purposed for diverse applications, including de novo genome assembly, deconvolution of metagenomic samples and inference of centromere locations. We describe a method, Centurion, that jointly infers the locations of all centromeres in a single genome from Hi-C data by exploiting the centromeres' tendency to cluster in three-dimensional space. We first demonstrate the accuracy of Centurion in identifying known centromere locations from high coverage Hi-C data of budding yeast and a human malaria parasite. We then use Centurion to infer centromere locations in 14 yeast species. Across all microbes that we consider, Centurion predicts 89% of centromeres within 5 kb of their known locations. We also demonstrate the robustness of the approach in datasets with low sequencing depth. Finally, we predict centromere coordinates for six yeast species that currently lack centromere annotations. These results show that Centurion can be used for centromere identification for diverse species of yeast and possibly other microorganisms. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
[19]
Wasser SP, 2017. Medicinal mushrooms in human clinical studies. Part I. Anticancer, oncoimmunological, and immunomodulatory activities: a review. International Journal of Medicinal Mushrooms, 19(4): 279-317
More than 130 medicinal functions are thought to be produced by medicinal mushrooms (MMs) and fungi, including antitumor, immunomodulating, antioxidant, radical scavenging, cardiovascular, antihypercholesterolemic, antiviral, antibacterial, antiparasitic, antifungal, detoxification, hepatoprotective, antidiabetic, and other effects. Many, if not all, higher Basidiomycetes mushrooms contain biologically active compounds in fruit bodies, cultured mycelia, and cultured broth. Special attention has been paid to mushroom polysaccharides. Numerous bioactive polysaccharides or polysaccharide-protein complexes from MMs seem to enhance innate and cell-mediated immune responses, and they exhibit antitumor activities in animals and humans. While the mechanism of their antitumor actions is still not completely understood, stimulation and modulation of key host immune responses by these mushroom compounds seems to be central. Most important for modern medicine are polysaccharides and low-molecular weight secondary metabolites with antitumor and immunostimulating properties. More than 600 studies have been conducted worldwide, and numerous human clinical trials on MMs have been published. Several of the mushroom compounds have proceeded through phase I, II, and III clinical studies and are used extensively and successfully in Asia to treat various cancers and other diseases. The aim of this review is to provide an overview of and analyze the literature on clinical trials using MMs with human anticancer, oncoimmunological, and immunomodulatory activities. High-quality, long-term, randomized, double-blind, placebo-controlled clinical studies of MMs, including well-sized population studies are definitely needed in order to yield statistical power showing their efficacy and safety. Clinical trials must obtain sufficient data on the efficacy and safety of MM-derived drugs and preparations. Discussion of results based on clinical studies of the anticancer, oncoimmunological, and immunomodulating activity of MMs are highlighted. Epidemiological studies with MMs are also discussed.
[20]
Zhang QY, Li WX, Yang J, Sheng LZ, Zhu SR, Zhu XK, 2020. Current situation and basic countermeasures for the protection of new varieties of food and medicinal mushrooms in China—taking the regulations for the protection of new varieties of plants as the legal source. Edible and Medicinal Mushrooms, 28(2): 98-102 (in Chinese)
[21]
Zuo HB, Xia BY, Wang XL, Li HL, Su XL, Liu HF, 2021. Investigation on germplasm resources and hybrid breeding in Ganoderma lucidum region. Modern Agricultural Research, 27(3): 109-110 (in Chinese)
[22]
才晓玲, 何伟, 安福全, 2016. 灵芝种质资源研究进展. 现代农业科技, 2016(6): 99-100
[23]
戴玉成, 杨祝良, 崔宝凯, 吴刚, 袁海生, 周丽伟, 何双辉, 葛再伟, 吴芳, 魏玉莲, 员瑗, 司静, 2021. 中国森林大型真菌重要类群多样性和系统学研究. 菌物学报, 40: 770-805
大型真菌主要为担子菌门的真菌和少数为子囊菌门的真菌,该类真菌具有重要的经济价值和生态功能,主要生长在森林生态系统中。30年来作者对我国几乎所有类型森林生态中的大型真菌进行了系统调查和采集,共采集标本11.2万号。基于对这些材料的形态学及分子系统学研究,并结合生态学和生物地理学特征,共鉴定出中国森林大型真菌4 250种,隶属于担子菌门和子囊菌门的21个目,发现和发表2个新科、4个新亚科、69个新属和885个新种。云南省是我国森林大型真菌最丰富的省份,描述于该省的新种有314种,占作者发表的全部中国新种的35%。这些研究为深入认识全球大型真菌物种多样性提供了中国的贡献,更新了我国重要食药用菌名称,揭示了我国毒蘑菇多样性基本特征,系统论述了我国森林病原菌的物种多样性,为资源利用、森林健康和保护提供了科学依据;论述了森林大型真菌代表性类群在种和属级水平的起源和演化,为今后开展重要类群科级、目级甚至纲级的系统进化关系提供了重要数据。
[24]
李绩, 2007. 我国食用菌菌种知识产权保护和现状分析. 中国发明与专利, 2007(8): 48-49
[25]
刘晓柳, 钟灿, 谢景, 侯凤飞, 戴甲木, 张水寒, 金剑, 2022. 药食用菌新品种保护及DUS测试指南研制现状与展望. 中草药, 53(4): 1173-1180
[26]
戚爱华, 孙艳霞, 李威, 车晓蕾, 连丰, 2018. 灵芝的药用价值. 农村经济与科技, 29(10): 141-142
[27]
孙缘缘, 2019. 论我国在“一带一路”沿线的自主知识产权保护. 广西质量监督导报, 2019(3): 207-208
[28]
张清洋, 李文学, 杨军, 盛立柱, 朱姝蕊, 朱星考, 2020. 我国食药用菌新品种保护现状与基本对策——以植物新品种保护条例为法律渊源. 食药用菌, 28(2): 98-102
[29]
左洪波, 夏伯阳, 王夕亮, 李华利, 苏小林, 刘红粉, 2021. 灵芝产区种质资源调查及杂交育种研究. 现代农业研究, 27(3): 109-110

Funding

Fujian Provincial Major National Research and Development Project(2022NZ029015)
PDF(1014 KB)

160

Accesses

0

Citation

Detail

Sections
Recommended

/