Introduction
During the past two decades, DNA-based approaches have increased the quality and reproducibility of species delimitation and identification (Ahrens, 2023). Standardized and automated species recognition using DNA has made it easy to link taxonomic information with diverse biological questions and applied research aspects (e. g., rapid assessment of biodiversity; Yu et al., 2012). Species delimitation and identification of animals are often based on information from a single mitochondrial gene, cytochrome oxidase I (COI) (Hebert et al., 2003; Fontaneto et al., 2015). Such single-marker reliance can lead to errors due to extrachromosomal inheritance, incomplete lineage sorting, sex-biased dispersal, asymmetrical introgression, and Wolbachia -mediated genetic sweeps of the marker (Funk & Omland, 2003; Ballard & Whitlock, 2004). At the same time, species delimitation approaches using nuclear-encoded markers have considerably improved in accuracy, allowing to complement the currently established single-gene barcoding approach (Dowton et al., 2014; Eberle et al., 2020; Gueuning et al., 2020; Prebus, 2021; Erikson et al., 2021; Dietz et al., 2023).
Besides mitochondrial genes, a variety of conserved nuclear markers have been used for species delimitation in different phylogenetic groups of Metazoa, such as nuclear ribosomal RNA genes (Lebonah et al., 2014; Chen et al., 2017; Krehenwinkel et al., 2019) and various housekeeping genes (Joshi et al., 2022). Furthermore, restriction site-associated DNA sequences (RADseq) (Baird et al., 2008; Pante et al., 2015; Herrera & Shank, 2016) and ultra-conserved elements (UCE) linked to more rapidly evolving flanking regions (Faircloth et al., 2012; Bejerano et al., 2004; Ješovnik et al., 2017; Zarza et al., 2018; Gueuning et al., 2020; Prebus, 2021) were used. However, these nuclear marker systems can hardly be applied universally across animals, either because they insufficiently capture intraspecific variation or because they do not provide orthologous loci across distantly related taxa (Pierce, 2019; Eberle et al., 2020).
Recently, Metazoa-level Universal Single Copy Orthologs (USCOs) have been proposed as a universal marker set for species-level DNA taxonomy of animals as an extension and improvement of conventional DNA barcoding (Eberle et al., 2020). USCOs are defined as protein-coding genes that are present and single-copy in at least 90% of the species within the available genomes of a given taxonomic group. They have originally been developed to benchmark the quality of genome assemblies (“BUSCO”, Simão et al., 2015). However, they also proved to be highly informative for addressing phylogenomic questions (Waterhouse et al., 2018; Fernández et al., 2018; Zhang et al., 2019; Stolle et al., 2022). This insight has led to the development of a recently published automated software pipeline that extracts USCOs from genome assemblies and generates phylogenies from the extracted sequence data (Sahbou et al., 2022). Finally, Metazoa-level USCOs (mzl-USCOs) have been shown to allow distinguishing highly similar morphospecies (even when COI was unable to do so) and reliably estimating their phylogenetic relationships in several clades of arthropods and vertebrates (Dietz et al., 2023).
What has remained unclear is whether mzl-USCOs can be considered a genetically unlinked representative sample of a species’ genome, which is a prerequisite for USCOs being reliable and useful in coalescent-based phylogenetic analyses and applications. Knowledge of the spatial distribution and physical linkage of mzl-USCOs is hence fundamental to assess whether these markers are indeed as suitable for delimiting species with coalescent-based approaches as currently assumed. We here study the two parameters ”spatial distribution” and ”physical linkage” by extracting USCOs from published whole genomes assembled to chromosome-level (WG) of various species of Metazoa and analyzing the physical distances between USCOs and their distribution across chromosomes. Furthermore, using unassembled reads from whole genome sequencing (WGS) datasets of four metazoan lineages (i.e,Anopheles mosquitos, Drosophila fruit flies,Heliconius butterflies, and Darwin’s finches), we assess to what extent phylogenetic analysis of the extracted mzl-USCOs provides results consistent with those of previous studies that used more extensive sets of markers from the same genomes.