Materials and Methods
Stomach samples
collection
Stomach content samples (\(n=24\)) were collected from fiveCampylomormyrus species with markedly different EOD and snout
morphologies (C. alces ,\(n=2\); C.compressirostris ,\(n=10\); C.curvirostris , \(n=2\);C. tshokwe , \(n=9\); C. numenius \(n=1\); see Lamanna
et al. 2016 for phylogenetic relationships, EODs, and snout
characteristics). Additionally, stomach content samples ofGnathonemus petersii(G. petersii, \(n=3\)),
the sister genus of Campylomormyrus , were used for comparison.
Unlike Campylomormyrus , G. petersii has no snout, but a
trunk-like protrusion on the head. The fish specimens from which these
stomach content samples are extracted were collected during an
expedition to the Republic of the Congo in fall 2012. All the analyzed
stomach samples were extracted from sympatric fish living in the Congo
River rapids south of Brazzaville [S4°18.788′ E15°13.790′]. Fish
were purchased from local anglers along the northern shore line of the
River Congo at Rapides de Kintambo downstream of the Malebo Pool (Runge
2007) north of the island Île de Singes, where the river bed consists of
rocks up to several meters in diameter and the river is very turbid.
Before stomach extracting, all fish individuals were weighed and
measured for total and standard length and were assigned unique and
museum identifiers (see Table S1.1 in Supplementary File S1). After
dissection, the stomach samples were stored in Queen’s tissue buffer
solution (20% dimethyl sulfoxyde, 0.25 M EDTA, saturated with NaCl, pH
8.0; Seutin et al.1991) at 4°C. Before DNA extraction, the stomach
contents were visually checked and found to be heavily digested, with
very few visible tissue parts remaining.
We assembled 26 stomach contents samples, excluding the C.
numenius sample due to its low read numbers (Sec. 3.1), to three
categories, based on species, EOD duration, and snout length. For
species-specific analysis, only the two species with larger sample size,
i.e., C. compressirostris (\(n=10\)) and C. tshokwe(\(n=9\)), were considered. Regarding EOD duration, we formed two
groups: Long EOD (>2ms) and short EOD (<0.3ms;
note the almost 10fold difference in EOD length among the two groups).
The first group (\(\text{long\ EOD};\ n=11\)) included the samples ofC. alces and C. tshokwe . The second group
(\(\text{short\ EOD};\ n=15\)) included the samples of C.
compressirostris , C. curvirostris , and G. petersii . EOD
categorization was based on Lamanna et al. (2016). Regarding snout
length, we formed three groups: Long snout (C. curvirostris andC. tshokwe , \(n=11\)), medium snout (C.
compressirostris , \(n=10\)), and short snout (C. alces andG. petersii , \(n=5\)). For this classification, we took
advantage of a previous geometric morphometrics analysis, which
stratified the respective species according to snout length along
principal component 1 (Figure 3 in Lamanna et al. 2016).
Handling of animals was conducted at the University of Brazzaville in
accordance with the relevant national guidelines and regulations
guidelines for the care and use of animals for scientific purposes.
Lysis and DNA extraction
DNA extraction was performed in 2019, appr. 7 years after sampling. Each
stomach content sample was homogenized using a Tissue Tip Homogenizer
before incubating with lysis reagents. The Qiagen DNeasy Mericon Food
Kit for extraction of high-quality DNA from digested foods was used to
carry out the DNA extraction from the homogenized samples. Since the
stomach content samples were almost digested, the DNA was suspected to
be highly fragmented. Therefore, the small fragment protocol of this kit
(200 mg) was used according to the manufacturer’s protocol. DNA was
eluted in the kit’s EB buffer (10 mM Tris-Cl, pH 8.5) and stored at
-20°C.
Libraries preparation, indexing and
amplification
After DNA extraction, all samples underwent single-stranded DNA library
preparation using T4 DNA ligase following (Gansauge et al. 2017). In
this method, biotinylated adapter molecules are attached to the ends of
the DNA fragments via hybridization to a stretch of six random
nucleotides belonging to a splinter oligonucleotide complementary to the
adapter and nick closure with T4 DNA ligase. The procedure is optimized
to highly degraded DNA fragments, as expected in digested food from the
stomach. An additional step of deoxyuracils removal using Uracil DNA
glycolase was included. Deamination of cytosin to uracil is a typical
damage pattern in degraded DNA and could cause erroneous mispairing to
adenin, resulting in genotyping errors.
After library preparation, the optimal number of Polymerase Chain
Reaction (PCR) cycles required for library indexing and amplification
was estimated by Quantitative Polymerase Chain Reaction (qPCR). The
rationale is to adjust the number of PCR cycles during the library
amplification to different DNA concentrations (Basler et al. 2017). The
reaction was performed by mixing 1ml of a 1:20 dilution of each library
with 3µl of nuclease free water, 5µl of 2 x SYBRTM green qPCR master mix
(ThermoFisher Scientic), 0.5µl of qPCR primer IS7, and 0.5µl of primer
IS8 (both 10µM; from Gansauge & Meyer 2013). For each library three
reaction replicates were analyzed. The PCR reactions were performed with
7min at 95°C, followed by 40 cycles of 10s at 95°C denaturation, 30s
annealing at 60°C, and 1min extension at 72°C. Considering the
differences in DNA concentration between the qPCR and the indexing PCR,
the optimal number of cycles for the indexing PCR was calculated
according to Basler et al. (2017).
The indexing primers P5 and P7 were chosen according to the library
protocol of Gansauge et al. (2017). The reaction mix for each sample
contained 44.8µl of nuclease free water, 8µl of 10 x AccuPrimeTM Pfx
reaction mix (ThermoFisher Scientic), 0.8µl of 2.5U AccuPrimeTM Pfx
polymerase, 3.2µl of P5 indexing primer, 3.2µl of P7 indexing primer
(both 10 µM) and 20µl of each library. Thermocycling started with 2min
at 95°C, followed by the number of cycles calculated according to Basler
et al. (2017) consisting of 15s at 95°C denaturation, 30s annealing at
60°C and 1 min extension at 68°C. After indexing and amplification, all
libraries were purified using the MinElute PCR Purification Kit (QIAGEN)
following the manufacturer’s instructions. The libraries were quantified
with a Qubit 3.0 fluorometer after purification and the distribution of
the DNA fragments lengths was measured using an Agilent 2100 bioanalyzer
system.
Probe design and DNA hybridization
capture
For probe design, we selected target species based on literature
research on the potential taxa eaten by Campylomormyrus and the
expected diversity of prey items in the Congo River region. We targeted
504 species across 6 animal taxa: Arthropoda, Annelida, Mollusca,
Nemertea, Nematoda, and Rotatoria. Additionally, we targeted 29 species
from 3 plant taxa: Streptophyta, Rhodophyta, and Chlorophyta. For these
species, mitochondrial COI gene (animals) and chloroplast rbcL gene
(plants) sequences were retrieved from the GenBank sequence database of
the National Center for Biotechnology Information (NCBI). When a species
of interest did not have sequence information available, sequence
information from another species of the same genus was included. If this
was not available, the taxon was excluded. The average sequence length
of all collected sequences was 700 bp. A complete
list of taxonomic names and NCBI
accession numbers can be found in the Supplementary File S2. The
sequence capture probes were custom-made at Arbor Biosciences (formerly
MY croarray) and supplied in form of a MYBaits InSolution Custom Target
Capture Kit (designs with 1-20k probes). The probes were designed to
achieve a high overall probe count and an increased coverage in high
GCcontent regions. The final probe set contained 15284 probes, each with
a length of 80bp. The synthesized probes comprise short overlapping
fragments representing the whole probe template sequences of COI and
rbcL.
The hybridization capture enrichment reactions were performed according
to the manufacturer’s instructions (Biosciences 2016). Using this
method, single-stranded DNA (ssDNA) fragments sharing sequence
similarity with predesigned ssDNA- or RNA-baits can be enriched.
Following these instructions, the recommended input of the library DNA
was 100ng to 500ng. This was equivalent to 7µl of the library material
with a DNA concentration of 14 to 72ng/µl. All materials (i.e., pooled
across replicates of the same specimen) from the library were used for
two rounds of hybridization capture reactions, thereby increasing
specificity (Krueger et al. 2021). The captured and amplified libraries
of the first round of capture reaction were used as input libraries for
the second round. For both rounds of capture, the libraries were not
pooled since pooling might produce low quality results (Biosciences
2016). The second captured and amplified libraries were pooled and
underwent sequencing on an Illumina MiSeq instrument using a protocol
for 2x150bp paired end of double indexed libraries (Biosciences 2016).
Bioinformatics, data analysis, and
statistics
Before further analyzing the high throughput sequencing reads, which
were generated in FASTQ format, we performed a quality processing to
remove the adapter sequences and the low-quality reads, using Cutadapt
version 2.10 (Martin 2011). Quality cutoff, minimal sequence length, and
minimal overlap between sequence and adapter were set to 20, 30, and 3,
respectively. Then, we performed quality control checks to the filtered
sequences using FastQC (Andrews 2010) to ensure their suitability for
further analysis.
Taxonomic classification for the filtered sequences was done using
Kraken version 2.0.9, a k -mer-based approach which provides a
fast taxonomic classification from the sequence data (Kraken2; Wood et
al. 2019). Kraken compares the reads of the metagenome to short
sequences of length \(k\), the so-called \(k\)-mers, from a database
that is associated with the sequence information underlying the tree of
life phylogeny. The algorithm then places the read on the tree of life,
as well as the taxon’s ancestors, based on its similarities to these\(k\)-mers,
in
accordance with its annotation at the lowest taxonomic level. During
the taxonomic classification process using Kraken2, specific weights are
allocated to each node based on the k-mer paths. This weight assignment
improves the sensitivity of taxon classification, ensuring more accurate
and precise results.Before applying Kraken2, a custom marker database of
available eukaryotic COI and RBCl sequences from NCBI was built to map
the reads. The results of Kraken2 were visualized and inspected using
the online tool Pavian metagenomics data explorer (Breitwieser &
Salzberg, 2019).
The read counts were used as a semi-quantitative estimate of the
relative abundance of food taxa in the diet of each sample. For any
sample, these count data were used to record both the occurrence
(presence/absence) of a taxon and the percentage of reads assigned to
that taxon. The frequency of occurrence (%FOO) is used to quantify
occurrence across samples. The relative read abundance (RRA) is used as
a proxy for relative biomass consumed. Details on the calculations of
these measures are given in the Supplementary File S3.
The dietary niche width was assessed by calculating the Shannon
diversity index (alpha diversity) for each sample using the spaaR package (Zhang 2016). Further, to describe the diet overlap among our
categories (species, EOD duration, snout length), we used two
complementary metrics: Schöner index (Schoener 1970) and Pianka index
(Pianka 1974). Pairwise calculations of these indices were performed
using the spaa R package (Zhang 2016). The equations of the
indices are given in the Supplementary File S3. We used theEcoSimR R package (Gotelli et al. 2015) to compare the results
with reference to 1000 permutations of a null model that holds the same
dietary niche width, while randomizing the values for the diet items. We
further calculated Bray–Curtis dissimilarity indices (beta diversity)
between each pair of samples using the vegdist -function in thevegan R package (Oksanen et al. 2019).
Additionally, we performed nonmetric multidimensional scaling (NMDS)
using the function metaMDS in the vegan R package to
visualize the patterns of dietary dissimilarity among samples.
Discriminant Analysis of
Principal Components (DAPC), using the dapc function in theadegenet R package (Jombart, 2008), was performed on the three
categories, i.e. species, EOD duration, and snout length, to identify
dietary items that are significant contributors to dietary differences
among species/phenotypic groups. We tested for differences in diet
composition among the groups by performing a permutational multivariate
analysis of variance (perMANOVA) with 999 permutations, using theadonis -function in the vegan R package.