IV. DISCUSSION

The adoption of multiple complementary scaffolding approaches resulted in an assembly of similar quality to the best available salmonid genomes. Multiple lines of evidence suggest that the genome presented here represents a nearly complete and accurate model of the Lake Trout genome. First, the total size of the finished genome was slightly greater than the genome size estimate obtained from GenomeScope. Pflug et al. (2020) found that k-mer based methods for genome size estimation tend to underestimate genome size by 4.5% on average, so this result is not entirely unexpected. Additionally, BUSCO scores were similar to those obtained for the highest quality salmonid genomes available at the time of analysis (e.g. Coho Salmon, Brown Trout, Rainbow Trout). Scores were highly similar between Brown Trout and Lake Trout genomes; however, the proportion of missing BUSCOs was 1.9% higher for Lake Trout and the proportion of complete duplicated BUSCOs was 2% lower suggesting that some duplicated regions might be missing from the Lake Trout genome. Nonetheless, these two assemblies had the highest percentage of complete BUSCOs and the highest percentage of complete duplicated BUSCOs out of the genome assemblies examined. Furthermore, the order of loci on the Lake Trout linkage map and the order of loci on Lake Trout chromosomes was shown to be highly concordant, suggesting that contigs are accurately ordered and properly oriented. The genome presented here is also highly contiguous, with a contig N50 higher than any published salmonid genome (but see the recently released assembly for Arlee Strain Rainbow Trout; GCF_013265735.2). Interestingly, the PacBio data used for assembly were of similar coverage to the data used for assembling the European Whitefish genome (De-Kayne et al. 2020); however, the Lake Trout genome contig N50 is >3X higher (although scaffold N50 is lower). There are two reasonable explanations for this. First, the European Whitefish genome was assembled using DNA from a wild-caught, outbred individual rather than a double haploid. Second, the European Whitefish genome was not gap filled after scaffolding. Gap filling the Lake Trout genome with PBJelly increased contig N50 by 561,496 bp.
The Lake Trout genome will likely be sufficient for the majority of downstream uses; however, improvements could likely be made using supplementary scaffolding resources such as a higher density linkage map or optical map (Pan et al. 2020). The annotation could also be improved by generating additional RNA-seq data. The number of annotated genes and pseudogenes (n=49,668) is similar to what has been obtained for other salmonids (eg Chum salmon Oncorhynchus keta , Sockeye salmonOncorhynchus nerka , and Dolly Varden) using the same annotation pipeline. However, it is important to note that annotation completeness is markedly reduced relative to other assemblies with similar BUSCO scores such as Atlantic Salmon (57,783; GCF_000233375.1; Annotation Release 100), Coho Salmon (63,465; GCF_002021735.2; Annotation Release 101), Brown Trout (61,583; GCF_901001165.1; Annotation Release 100), Rainbow Trout (55,630, GCF_002163495.1, Annotation Release 100), and Chinook Salmon (53,685, GCF_002872995.1, Annotation Release 100). These annotations were produced using RNA-seq evidence from a greater diversity of tissue types, which likely explains this discrepancy. The Lake Trout annotation, as well as annotations for other salmonids, could also be further improved by directly sequencing full length transcripts using long-read sequencing technologies (Workman et al. 2018). We predict that the completeness of the Lake Trout genome annotation will be improved as more gene expression data from a greater diversity of tissue types becomes available for the species (Salzberg 2019). Nonetheless, the current genome annotation will undoubtably aid in the interpretation of future findings by allowing researchers to link signals of selection and loci associated with phenotypes with putatively causal genes and biological processes. Publicly available gene expression and functional annotation resources, like those being developed by the Functional Annotation of All Salmonid Genomes (FAASG) initiative, will also aid in this effort (Macqueen et al. 2017).
The availability of a second high-quality assembly for aSalvelinus species will likely benefit comparative genomic research aimed at understanding the evolutionary consequences of genome duplication. Salmonids have long been appreciated as a model system for understanding evolution following whole genome duplication events (Ohno 1970) and the wealth of genomic resources for salmonids will hopefully continue to shed light on the evolutionary processes at play following autotetraploid genome duplication events. Additionally, multiple recent studies have highlighted the importance of structural genetic variation for promoting adaptive diversification within salmonid species (Pearse et al. 2019; Bertolotti et al. 2020), and chromosome-anchored genome assemblies are typically needed for detecting and genotyping structural variants (Merot et al. 2020).
Genomic methods have dramatically increased the precision of population genetic analyses and have enabled researchers to address qualitatively unique questions that require some knowledge of genome structure and function (Waples et al. 2020). Lake Trout have undergone repeated parallel adaptive radiations and ecotypic diversity appears to be heritable (Goetz et al. 2010); however, the genetic or epigenetic basis for ecotypic diversity is still unclear (Perreault‐Payette et al. 2017). A genome assembly will greatly simplify the process of mapping loci associated with ecophenotypic differentiation and could enable identification of loci associated with reproductive isolation among ecotypes in populations where multiple ecotypes exist. Anecdotal evidence suggests that Lake Superior once harbored as many as ten ecotypes (Goodier 1981). Three ecotypes are contemporarily recognized (lean, siscowet and humper) and a fourth ecotype was recently identified (redfin; Muir et al. 2014). Interestingly, Muir et al. (2014) found that ecotypes collected near Isle Royale were moderately distinct, which is at odds with historical records suggesting that they were easy to identify visually (Rakestraw 1967). An improved understanding of the genetic basis for ecotypic differentiation could help determine if this is due to phenotypic plasticity, increased levels of hybridization between ecotypes, or other processes (Baillie et al. 2016). The ability to genotype historical collections and quantify levels of adaptive differentiation at different time points (Guinand et al. 2003) provides a particularly exciting avenue for future research on Lake Trout.
The Lake Trout genome assembly could also have important implications for ongoing Lake Trout restoration activities throughout the Great Lakes. The resources presented here will allow for the identification of loci associated with variation in fitness between Lake Trout hatchery strains in contemporary Great Lakes environments (Scribner et al. 2018) and the identification of loci that are adaptively diverged between hatchery strains. This information could help fisheries managers to maximize adaptive genetic diversity in re-emerging wild populations and prioritize hatchery populations for continued propagation.