High but underestimated diversity of OTUs
Given that the low percentage of fish species sequenced for the 12S in the region is the main limitation to detect taxonomic diversity (Fig. 2c), we used an alternative approach based on unique clusters of genetic sequences called Operational Taxonomic Units (OTUs).
From the 331,839,591 initial reads, 183,546 OTUs were generated using the SWARM clustering algorithm. After a series of post-clustering curation processes, 972 fish OTUs were filtered among which 819 were assigned to a family (Suppl. Table S3). The number of detected OTUs varied from 1 to 54 among fish families (Fig. 3a), the richest families (>50 OTUs) being the Gobiidae, Labridae and Pomacentridae. Overall the number of OTUs was superior to the number of assigned taxa (genus and species) in 64.7% of the families found in the samples (mean Δ = 4 ± 6.7 SD, Fig. 3a). This richness difference was null in 31.4% of the families and negative in 3.9% of them (Fig. 3a). This difference was notably high in some rich families such as the Gobiidae and Pomacentridae where the number of OTUs was more than 2 times and 1.5 times higher than the number of assigned taxa, respectively. By contrast, only 7 OTUs were produced compared to 11 assigned taxa for the Scombridae so Δ = -4 units or -66.7% of this family richness.
The discrepancy between the two approaches (taxa and OTUs) was not significantly explained by the species richness of the family in the checklist (R ² < 0.01, p = 0.08, Fig. 3b) and marginally explained, albeit non-significantly, by the percentage of sequenced fish species within each family in the checklist (R ² = 0.09, p = 0.05, Fig. 3c).
On average, the number of OTUs underestimated the total number of coastal fish species in the Bird’s Head Peninsula checklist with a mean net difference of 40.2% per family (± 38.8% SD, Fig.3d). For most families this difference was high, reaching the maximum value of 95% for the Pseudochromidae. However, this difference could also be negative with more OTUs detected than species present in the checklist as for the Dasyatidae, Leiognathidae and Orectolobidae for which this difference reached -50%. Overall, the difference was marginally but significantly explained by the species richness of the family in the regional checklist (R ² = 0.09, p = 0.04, Fig. 3d), suggesting that the bias is not proportional to the species richness of the family with species-rich families being more underestimated by OTUs than species-poor families.