Materials and Methods

Stomach samples collection

Stomach content samples (\(n=24\)) were collected from fiveCampylomormyrus species with markedly different EOD and snout morphologies (C. alces ,\(n=2\); C.compressirostris ,\(n=10\); C.curvirostris , \(n=2\);C. tshokwe , \(n=9\); C. numenius \(n=1\); see Lamanna et al. 2016 for phylogenetic relationships, EODs, and snout characteristics). Additionally, stomach content samples ofGnathonemus petersii(G. petersii, \(n=3\)), the sister genus of Campylomormyrus , were used for comparison. Unlike Campylomormyrus , G. petersii has no snout, but a trunk-like protrusion on the head. The fish specimens from which these stomach content samples are extracted were collected during an expedition to the Republic of the Congo in fall 2012. All the analyzed stomach samples were extracted from sympatric fish living in the Congo River rapids south of Brazzaville [S4°18.788′ E15°13.790′]. Fish were purchased from local anglers along the northern shore line of the River Congo at Rapides de Kintambo downstream of the Malebo Pool (Runge 2007) north of the island Île de Singes, where the river bed consists of rocks up to several meters in diameter and the river is very turbid. Before stomach extracting, all fish individuals were weighed and measured for total and standard length and were assigned unique and museum identifiers (see Table S1.1 in Supplementary File S1). After dissection, the stomach samples were stored in Queen’s tissue buffer solution (20% dimethyl sulfoxyde, 0.25 M EDTA, saturated with NaCl, pH 8.0; Seutin et al.1991) at 4°C. Before DNA extraction, the stomach contents were visually checked and found to be heavily digested, with very few visible tissue parts remaining.
We assembled 26 stomach contents samples, excluding the C. numenius sample due to its low read numbers (Sec. ‎3.1), to three categories, based on species, EOD duration, and snout length. For species-specific analysis, only the two species with larger sample size, i.e., C. compressirostris (\(n=10\)) and C. tshokwe(\(n=9\)), were considered. Regarding EOD duration, we formed two groups: Long EOD (>2ms) and short EOD (<0.3ms; note the almost 10fold difference in EOD length among the two groups). The first group (\(\text{long\ EOD};\ n=11\)) included the samples ofC. alces and C. tshokwe . The second group (\(\text{short\ EOD};\ n=15\)) included the samples of C. compressirostris , C. curvirostris , and G. petersii . EOD categorization was based on Lamanna et al. (2016). Regarding snout length, we formed three groups: Long snout (C. curvirostris andC. tshokwe , \(n=11\)), medium snout (C. compressirostris , \(n=10\)), and short snout (C. alces andG. petersii , \(n=5\)). For this classification, we took advantage of a previous geometric morphometrics analysis, which stratified the respective species according to snout length along principal component 1 (Figure 3 in Lamanna et al. 2016).
Handling of animals was conducted at the University of Brazzaville in accordance with the relevant national guidelines and regulations guidelines for the care and use of animals for scientific purposes.

Lysis and DNA extraction

DNA extraction was performed in 2019, appr. 7 years after sampling. Each stomach content sample was homogenized using a Tissue Tip Homogenizer before incubating with lysis reagents. The Qiagen DNeasy Mericon Food Kit for extraction of high-quality DNA from digested foods was used to carry out the DNA extraction from the homogenized samples. Since the stomach content samples were almost digested, the DNA was suspected to be highly fragmented. Therefore, the small fragment protocol of this kit (200 mg) was used according to the manufacturer’s protocol. DNA was eluted in the kit’s EB buffer (10 mM Tris-Cl, pH 8.5) and stored at -20°C.

Libraries preparation, indexing and amplification

After DNA extraction, all samples underwent single-stranded DNA library preparation using T4 DNA ligase following (Gansauge et al. 2017). In this method, biotinylated adapter molecules are attached to the ends of the DNA fragments via hybridization to a stretch of six random nucleotides belonging to a splinter oligonucleotide complementary to the adapter and nick closure with T4 DNA ligase. The procedure is optimized to highly degraded DNA fragments, as expected in digested food from the stomach. An additional step of deoxyuracils removal using Uracil DNA glycolase was included. Deamination of cytosin to uracil is a typical damage pattern in degraded DNA and could cause erroneous mispairing to adenin, resulting in genotyping errors.
After library preparation, the optimal number of Polymerase Chain Reaction (PCR) cycles required for library indexing and amplification was estimated by Quantitative Polymerase Chain Reaction (qPCR). The rationale is to adjust the number of PCR cycles during the library amplification to different DNA concentrations (Basler et al. 2017). The reaction was performed by mixing 1ml of a 1:20 dilution of each library with 3µl of nuclease free water, 5µl of 2 x SYBRTM green qPCR master mix (ThermoFisher Scientic), 0.5µl of qPCR primer IS7, and 0.5µl of primer IS8 (both 10µM; from Gansauge & Meyer 2013). For each library three reaction replicates were analyzed. The PCR reactions were performed with 7min at 95°C, followed by 40 cycles of 10s at 95°C denaturation, 30s annealing at 60°C, and 1min extension at 72°C. Considering the differences in DNA concentration between the qPCR and the indexing PCR, the optimal number of cycles for the indexing PCR was calculated according to Basler et al. (2017).
The indexing primers P5 and P7 were chosen according to the library protocol of Gansauge et al. (2017). The reaction mix for each sample contained 44.8µl of nuclease free water, 8µl of 10 x AccuPrimeTM Pfx reaction mix (ThermoFisher Scientic), 0.8µl of 2.5U AccuPrimeTM Pfx polymerase, 3.2µl of P5 indexing primer, 3.2µl of P7 indexing primer (both 10 µM) and 20µl of each library. Thermocycling started with 2min at 95°C, followed by the number of cycles calculated according to Basler et al. (2017) consisting of 15s at 95°C denaturation, 30s annealing at 60°C and 1 min extension at 68°C. After indexing and amplification, all libraries were purified using the MinElute PCR Purification Kit (QIAGEN) following the manufacturer’s instructions. The libraries were quantified with a Qubit 3.0 fluorometer after purification and the distribution of the DNA fragments lengths was measured using an Agilent 2100 bioanalyzer system.

Probe design and DNA hybridization capture

For probe design, we selected target species based on literature research on the potential taxa eaten by Campylomormyrus and the expected diversity of prey items in the Congo River region. We targeted 504 species across 6 animal taxa: Arthropoda, Annelida, Mollusca, Nemertea, Nematoda, and Rotatoria. Additionally, we targeted 29 species from 3 plant taxa: Streptophyta, Rhodophyta, and Chlorophyta. For these species, mitochondrial COI gene (animals) and chloroplast rbcL gene (plants) sequences were retrieved from the GenBank sequence database of the National Center for Biotechnology Information (NCBI). When a species of interest did not have sequence information available, sequence information from another species of the same genus was included. If this was not available, the taxon was excluded. The average sequence length of all collected sequences was 700 bp. A complete list of taxonomic names and NCBI accession numbers can be found in the Supplementary File S2. The sequence capture probes were custom-made at Arbor Biosciences (formerly MY croarray) and supplied in form of a MYBaits InSolution Custom Target Capture Kit (designs with 1-20k probes). The probes were designed to achieve a high overall probe count and an increased coverage in high GCcontent regions. The final probe set contained 15284 probes, each with a length of 80bp. The synthesized probes comprise short overlapping fragments representing the whole probe template sequences of COI and rbcL.
The hybridization capture enrichment reactions were performed according to the manufacturer’s instructions (Biosciences 2016). Using this method, single-stranded DNA (ssDNA) fragments sharing sequence similarity with predesigned ssDNA- or RNA-baits can be enriched. Following these instructions, the recommended input of the library DNA was 100ng to 500ng. This was equivalent to 7µl of the library material with a DNA concentration of 14 to 72ng/µl. All materials (i.e., pooled across replicates of the same specimen) from the library were used for two rounds of hybridization capture reactions, thereby increasing specificity (Krueger et al. 2021). The captured and amplified libraries of the first round of capture reaction were used as input libraries for the second round. For both rounds of capture, the libraries were not pooled since pooling might produce low quality results (Biosciences 2016). The second captured and amplified libraries were pooled and underwent sequencing on an Illumina MiSeq instrument using a protocol for 2x150bp paired end of double indexed libraries (Biosciences 2016).

Bioinformatics, data analysis, and statistics

Before further analyzing the high throughput sequencing reads, which were generated in FASTQ format, we performed a quality processing to remove the adapter sequences and the low-quality reads, using Cutadapt version 2.10 (Martin 2011). Quality cutoff, minimal sequence length, and minimal overlap between sequence and adapter were set to 20, 30, and 3, respectively. Then, we performed quality control checks to the filtered sequences using FastQC (Andrews 2010) to ensure their suitability for further analysis.
Taxonomic classification for the filtered sequences was done using Kraken version 2.0.9, a k -mer-based approach which provides a fast taxonomic classification from the sequence data (Kraken2; Wood et al. 2019). Kraken compares the reads of the metagenome to short sequences of length \(k\), the so-called \(k\)-mers, from a database that is associated with the sequence information underlying the tree of life phylogeny. The algorithm then places the read on the tree of life, as well as the taxon’s ancestors, based on its similarities to these\(k\)-mers, in accordance with its annotation at the lowest taxonomic level. During the taxonomic classification process using Kraken2, specific weights are allocated to each node based on the k-mer paths. This weight assignment improves the sensitivity of taxon classification, ensuring more accurate and precise results.Before applying Kraken2, a custom marker database of available eukaryotic COI and RBCl sequences from NCBI was built to map the reads. The results of Kraken2 were visualized and inspected using the online tool Pavian metagenomics data explorer (Breitwieser & Salzberg, 2019).
The read counts were used as a semi-quantitative estimate of the relative abundance of food taxa in the diet of each sample. For any sample, these count data were used to record both the occurrence (presence/absence) of a taxon and the percentage of reads assigned to that taxon. The frequency of occurrence (%FOO) is used to quantify occurrence across samples. The relative read abundance (RRA) is used as a proxy for relative biomass consumed. Details on the calculations of these measures are given in the Supplementary File S3.
The dietary niche width was assessed by calculating the Shannon diversity index (alpha diversity) for each sample using the spaaR package (Zhang 2016). Further, to describe the diet overlap among our categories (species, EOD duration, snout length), we used two complementary metrics: Schöner index (Schoener 1970) and Pianka index (Pianka 1974). Pairwise calculations of these indices were performed using the spaa R package (Zhang 2016). The equations of the indices are given in the Supplementary File S3. We used theEcoSimR R package (Gotelli et al. 2015) to compare the results with reference to 1000 permutations of a null model that holds the same dietary niche width, while randomizing the values for the diet items. We further calculated Bray–Curtis dissimilarity indices (beta diversity) between each pair of samples using the vegdist -function in thevegan R package (Oksanen et al. 2019).
Additionally, we performed nonmetric multidimensional scaling (NMDS) using the function metaMDS in the vegan R package to visualize the patterns of dietary dissimilarity among samples. Discriminant Analysis of Principal Components (DAPC), using the dapc function in theadegenet R package (Jombart, 2008), was performed on the three categories, i.e. species, EOD duration, and snout length, to identify dietary items that are significant contributors to dietary differences among species/phenotypic groups. We tested for differences in diet composition among the groups by performing a permutational multivariate analysis of variance (perMANOVA) with 999 permutations, using theadonis -function in the vegan R package.