Result
1 Core-pan analysis
The result of core-pan analysis showed that viruses within the 32
ranaviruses share 44 strictly core genes (Figure 1). The number of
unique genes (gene clusters which exist in only one species) in almost
all virus isolates except for ESV, SGIV and TFV is no more than 5. ESV,
SGIV and TFV possess 24, 47 and 8 unique genes, respectively. The
detailed information of strictly core genes and unique gene within
32
ranaviruses is listed in Table S2.
2 Phylogenetic analysis
The genomic sequences of 32 ranavirus genomes and other members of
family Iridoviridae (Chloriridovirus, Iridovirus, Lymphocystivirus and
Megalocytivirus) genomes were obtained from GenBank and uploaded onto
CV-Tree online analysis platform to construct phylogenetic trees based
on whole-genome. The result showed that ranavirus genus and other
members of family Iridoviridae were clustered into their respective
groups (Figure 2). In ranavirus genus, 32 ranavirus isolates can be
divided to 4 different subspecies including GIV-like, EHNV-like,
FV3-like and CMTV-like (Figure 2).
Based on core-pan analysis, 44 ranavirus core genes were obtained from
32 ranaviruses’ complete genomes. Within 44 ranavirus core genes, 24
iridovirus core genes were identified (see
Table
S3). Neighbor-joining phylogenetic analyses based on the concatenated
ranavirus core genes (60,721 nt characters including gaps) of genus and
iridovirid core genes (37,803 nt characters including gaps) were
performed (Figure 3). Neighbor-joining phylogenetic tree (NJ-Tree) based
on ranavirus core genes was similar to the one constructed using
iridovirus core genes (Figure 3). Both NJ-phylogenetic trees revealed
that ranaviruses can be classified into four distinct lineages, and the
taxonomic positions of subspecies were also consistent with CV-Tree.
The subspecies classification of 32 ranaviuses based on NJ-Tree and
CV-tree were summarized in Table S4.
However,
the phylogenetic position of ToRV could not be clearly determined by
NJ-Tree and CV-tree (Figure 2 and Figure 3).
3 Dot plot analysis
Dot plot analysis using Java Dot Plot Alignments(Brodie et al., 2004) is
a visual tool used in identifying linear relationships and comparing
genomic structural changes of two sequences, such as deletion, insertion
and inversions. Dot plot studies clearly indicate whether gene order is
conserved or not and may serve as a way to determine the classification
and evolutionary relationship (Jancovich, Steckler, et al., 2015). In
our study, the genomic sequence of FV3 (AY548484) was compared to other
31 completed ranavirus sequences using JDotter software. The result
indicated that the same group of ranavirus subspecies approximately
showed similar linearity pattern (Figure 4). Especially in EHNV-like
subspecies group, these isolates in the same subspecies showed analogous
genome architecture changes.
In
FV3-like subspecies group, except for RCV, most isolates shared complete
collinearity with FV3 (AY548484). Despite RCV not sharing complete
collinearity with FV3, linearity pattern of RCV is the most similar with
FV3-like subspecies group. According to previous phylogenetic analyses,
evolutionary distances of GIV-like subspecies were farthest from the
FV3. Likewise, GIV and SGIV display only short segments of genomic
collinearity with FV3 (Figure 4). Therefore, we can use the rule of the
same subspecies showing similar linearity pattern to determine the
ranavirus taxonomy as a supplementary way. In general, the taxonomic
positions of subspecies based on dot plot analysis were also consistent
with previous phylogenetic analysis (Table S4).
Previously, the position of ToRV could not be determined clearly by
phylogenetic analysis. Dot plot analysis demonstrated that linearity
pattern of ToRV is more similar to CMTV-like subspecies group instead of
FV3-like group (Figure 4). In order to determine the similarity degree
of the ToRV genome with CMTV-like and FV3-like group, we carried out dot
plot comparisons between ToRV and CMTV-like/FV3-like group ranaviruses.
The result of dot plot showed that ToRV shares high collinearity with
CMTV-like group (Figure S1), which indicated that ToRV should belong to
CMTV-like group. In summary, taxonomic positions of 32 ranaviruses were
confirmed by using phylogenetic analyses and dot plot analyses.
4 Filtering of ranavirus core
genes
The phylogenetic analyses and dot plot analyses based on whole viral
genome analysis can precisely determine taxonomic positions of ranavirus
subspecies. However, these methods need to perform high throughput
sequencing, which is costly and time-consuming. Generally, single-gene
or the concatenated several sequences taxonomic analysis is
cost-effective and convenient. To determine which genes are suitable for
phylogenetic analysis, recombination analysis and substitution
saturation analysis were performed to screen out qualified genes.
4.1 Recombination analysis of ranavirus core
genes
If the sequences used in phylogenetic analysis have recombinant
fragments, recombination events can seriously decrease the accuracy of
phylogenetic trees(Posada et al., 2002). In order to avoid the problems
caused
by
recombination events, recombination analysis using RDP4 was performed to
remove recombinant sequences (Darren P. Martin et al., 2015). The
analysis performed with RDP4 showed 38 recombination events within
ranavirus core genes (Figure S2 and Table S5), involving 21 ranavirus
core genes. These core genes that have recombinant fragments are not
suitable for phylogenetic trees. For example, No.1 recombination event,
fragment (nt 18929-23747) of RCV-Z-MF187209 had a recombination with
FV3-AY548484 (Table S5). The sequences of this interval were used for
phylogenetic analysis, the result showed that RCV-Z was closest to FV3
(Figure 5A). However, RCV-Z and FV3 belong to different subspecies based
on our previous analysis (Figure 3 and Figure 4) and UPGMA analysis
(Figure 5B). Therefore, ranavirus core genes involved with recombination
events should be eliminated to make sure of the accuracy of phylogenetic
trees.
4.2 Substitution saturation analysis of ranavirus core
genes
The accuracy of phylogenetic tree depends on sequence divergence, which
means qualified sequences are neither too conserved (contain few
substitutions) nor too diverged (experience substantial substitution
saturation). In this study, we used DAMBE7 software to evaluate the
substitution saturation of ranavirus core genes (Xuhua Xia, 2018). The
results showed that cluster 5, cluster 8, cluster 13, cluster 15,
cluster 20, cluster 22 and cluster 38 are not suitable for phylogenetic
analysis (Table S5), because their values of Iss(index of substitution saturation) are not significantly smaller than
the respective values of Iss.c (critical Iss). IfIss is not smaller than Iss.c , then we can conclude that
the sequences have experienced severe substitution saturation and should
not be used for phylogenetic reconstruction. To determine these
sequences are not qualified to phylogenetic analysis, NJ- phylogenetic
trees were constructed by using these genes (Figure S3). The
classification result based on sequences with substantial substitution
saturation are inconsistent with the genomic phylogenetic trees and dot
plot analysis (Figure S3). In addition, most of the bootstrap values
were very low.
5 Phylogenetic analysis of single core
gene
The 16 ranavirus core genes were filtered out based on recombination
analysis and substitution saturation analysis (Table S6). These genes do
not contain any recombination sites and experience little substitution
saturation, hence they are qualified to be used in the phylogenetic
analysis. In order to find out single-gene taxonomic analysis which are
consistent with whole genome analysis, single-gene phylogenetic analysis
was performed. The 16 Neighbor-joining phylogenetic trees revealed that
only clusters 2, 9, 12 and 21 of taxonomic analysis were consistent with
whole genome analysis (Figure S4). Then, a phylogenetic tree was
constructed based on the concatenated 4 nucleotide sequences (Figure 6),
tree structures based on the concatenated 4 sequences is very similar to
the tree based on the 44 ranaviruses core genes (Figure 3 and Figure 6),
and the taxonomic positions of subspecies were also consistent with
phylogenetic analysis based on core genes (Table S4).