Result

1 Core-pan analysis

The result of core-pan analysis showed that viruses within the 32 ranaviruses share 44 strictly core genes (Figure 1). The number of unique genes (gene clusters which exist in only one species) in almost all virus isolates except for ESV, SGIV and TFV is no more than 5. ESV, SGIV and TFV possess 24, 47 and 8 unique genes, respectively. The detailed information of strictly core genes and unique gene within 32 ranaviruses is listed in Table S2.

2 Phylogenetic analysis

The genomic sequences of 32 ranavirus genomes and other members of family Iridoviridae (Chloriridovirus, Iridovirus, Lymphocystivirus and Megalocytivirus) genomes were obtained from GenBank and uploaded onto CV-Tree online analysis platform to construct phylogenetic trees based on whole-genome. The result showed that ranavirus genus and other members of family Iridoviridae were clustered into their respective groups (Figure 2). In ranavirus genus, 32 ranavirus isolates can be divided to 4 different subspecies including GIV-like, EHNV-like, FV3-like and CMTV-like (Figure 2).
Based on core-pan analysis, 44 ranavirus core genes were obtained from 32 ranaviruses’ complete genomes. Within 44 ranavirus core genes, 24 iridovirus core genes were identified (see Table S3). Neighbor-joining phylogenetic analyses based on the concatenated ranavirus core genes (60,721 nt characters including gaps) of genus and iridovirid core genes (37,803 nt characters including gaps) were performed (Figure 3). Neighbor-joining phylogenetic tree (NJ-Tree) based on ranavirus core genes was similar to the one constructed using iridovirus core genes (Figure 3). Both NJ-phylogenetic trees revealed that ranaviruses can be classified into four distinct lineages, and the taxonomic positions of subspecies were also consistent with CV-Tree.
The subspecies classification of 32 ranaviuses based on NJ-Tree and CV-tree were summarized in Table S4. However, the phylogenetic position of ToRV could not be clearly determined by NJ-Tree and CV-tree (Figure 2 and Figure 3).

3 Dot plot analysis

Dot plot analysis using Java Dot Plot Alignments(Brodie et al., 2004) is a visual tool used in identifying linear relationships and comparing genomic structural changes of two sequences, such as deletion, insertion and inversions. Dot plot studies clearly indicate whether gene order is conserved or not and may serve as a way to determine the classification and evolutionary relationship (Jancovich, Steckler, et al., 2015). In our study, the genomic sequence of FV3 (AY548484) was compared to other 31 completed ranavirus sequences using JDotter software. The result indicated that the same group of ranavirus subspecies approximately showed similar linearity pattern (Figure 4). Especially in EHNV-like subspecies group, these isolates in the same subspecies showed analogous genome architecture changes. In FV3-like subspecies group, except for RCV, most isolates shared complete collinearity with FV3 (AY548484). Despite RCV not sharing complete collinearity with FV3, linearity pattern of RCV is the most similar with FV3-like subspecies group. According to previous phylogenetic analyses, evolutionary distances of GIV-like subspecies were farthest from the FV3. Likewise, GIV and SGIV display only short segments of genomic collinearity with FV3 (Figure 4). Therefore, we can use the rule of the same subspecies showing similar linearity pattern to determine the ranavirus taxonomy as a supplementary way. In general, the taxonomic positions of subspecies based on dot plot analysis were also consistent with previous phylogenetic analysis (Table S4).
Previously, the position of ToRV could not be determined clearly by phylogenetic analysis. Dot plot analysis demonstrated that linearity pattern of ToRV is more similar to CMTV-like subspecies group instead of FV3-like group (Figure 4). In order to determine the similarity degree of the ToRV genome with CMTV-like and FV3-like group, we carried out dot plot comparisons between ToRV and CMTV-like/FV3-like group ranaviruses. The result of dot plot showed that ToRV shares high collinearity with CMTV-like group (Figure S1), which indicated that ToRV should belong to CMTV-like group. In summary, taxonomic positions of 32 ranaviruses were confirmed by using phylogenetic analyses and dot plot analyses.

4 Filtering of ranavirus core genes

The phylogenetic analyses and dot plot analyses based on whole viral genome analysis can precisely determine taxonomic positions of ranavirus subspecies. However, these methods need to perform high throughput sequencing, which is costly and time-consuming. Generally, single-gene or the concatenated several sequences taxonomic analysis is cost-effective and convenient. To determine which genes are suitable for phylogenetic analysis, recombination analysis and substitution saturation analysis were performed to screen out qualified genes.

4.1 Recombination analysis of ranavirus core genes

If the sequences used in phylogenetic analysis have recombinant fragments, recombination events can seriously decrease the accuracy of phylogenetic trees(Posada et al., 2002). In order to avoid the problems caused by recombination events, recombination analysis using RDP4 was performed to remove recombinant sequences (Darren P. Martin et al., 2015). The analysis performed with RDP4 showed 38 recombination events within ranavirus core genes (Figure S2 and Table S5), involving 21 ranavirus core genes. These core genes that have recombinant fragments are not suitable for phylogenetic trees. For example, No.1 recombination event, fragment (nt 18929-23747) of RCV-Z-MF187209 had a recombination with FV3-AY548484 (Table S5). The sequences of this interval were used for phylogenetic analysis, the result showed that RCV-Z was closest to FV3 (Figure 5A). However, RCV-Z and FV3 belong to different subspecies based on our previous analysis (Figure 3 and Figure 4) and UPGMA analysis (Figure 5B). Therefore, ranavirus core genes involved with recombination events should be eliminated to make sure of the accuracy of phylogenetic trees.

4.2 Substitution saturation analysis of ranavirus core genes

The accuracy of phylogenetic tree depends on sequence divergence, which means qualified sequences are neither too conserved (contain few substitutions) nor too diverged (experience substantial substitution saturation). In this study, we used DAMBE7 software to evaluate the substitution saturation of ranavirus core genes (Xuhua Xia, 2018). The results showed that cluster 5, cluster 8, cluster 13, cluster 15, cluster 20, cluster 22 and cluster 38 are not suitable for phylogenetic analysis (Table S5), because their values of Iss(index of substitution saturation) are not significantly smaller than the respective values of Iss.c (critical Iss). IfIss is not smaller than Iss.c , then we can conclude that the sequences have experienced severe substitution saturation and should not be used for phylogenetic reconstruction. To determine these sequences are not qualified to phylogenetic analysis, NJ- phylogenetic trees were constructed by using these genes (Figure S3). The classification result based on sequences with substantial substitution saturation are inconsistent with the genomic phylogenetic trees and dot plot analysis (Figure S3). In addition, most of the bootstrap values were very low.

5 Phylogenetic analysis of single core gene

The 16 ranavirus core genes were filtered out based on recombination analysis and substitution saturation analysis (Table S6). These genes do not contain any recombination sites and experience little substitution saturation, hence they are qualified to be used in the phylogenetic analysis. In order to find out single-gene taxonomic analysis which are consistent with whole genome analysis, single-gene phylogenetic analysis was performed. The 16 Neighbor-joining phylogenetic trees revealed that only clusters 2, 9, 12 and 21 of taxonomic analysis were consistent with whole genome analysis (Figure S4). Then, a phylogenetic tree was constructed based on the concatenated 4 nucleotide sequences (Figure 6), tree structures based on the concatenated 4 sequences is very similar to the tree based on the 44 ranaviruses core genes (Figure 3 and Figure 6), and the taxonomic positions of subspecies were also consistent with phylogenetic analysis based on core genes (Table S4).