Functional CNVs extraction from transcriptome
The transcriptome data of 99 tree species (listed in Table S1) was from a research program of Han et al. (2017). Plant mRNAs were extracted from seedling leaves collected from GTS FDP with the sampling strategy as follow. Fully expanded leaves from three seedling individuals per species were sampled from each of the five main habitats (low valley, low ridge, mid-slope, high slope and high ridge) (Chen et al. 2010). While some rare species were only sampled from three individuals or three leaves from the only seedling individual. The samples were immediately frozen in liquid nitrogen in the field and then stored in a -80°C freezer before sequencing. The transcriptomes were sequenced on an Illumina HiSeq 2500 platform with 2×125 bp length reads and at least 6G clean data for each sample, de novo assembled by Trinity v2.2 without reference genome sequence (Grabherr et al. 2011) and annotated to GOs by the software Blast2GO using the UniProt database (The UniProt Consortium 2016). In this study, we focused on four GOs with terms “defense response to fungus” (GO: 0050832), “defense response to bacterium” (GO: 0042742), “defense response to insect” (GO: 0002213), and “defense response to virus” (GO: 0051607), which are involved in the defense response to four lineages of natural enemies. Based on the result of GO annotation, we picked out transcripts annotated by the four GOs for 99 well sequenced tree species and translated to protein sequences by TransDecoder and the Pfam database (Haas et al. 2013). For each GO, we did the all-by-all blast for the protein sequences set. Before clustering, the blast results were filtered with 0.4 hit fraction. And then homologous gene clusters were obtained by employing MCL software with 10-5 for the e-value and 2.0 for inflation value. The steps from blasting to clustering were referring the pipeline of Yang & Smith (2014). At last, we counted the number of genes in each cluster for each species. This resulted in four matrices containing gene clusters in columns and 99 tree species in the rows (hereafter denoted as functional CNV matrices). To show the dissimilarity of functional CNV among species, a heat-map was drawn by two-way cluster with the heatmap package in R 4.0.2 (R core Team, 2020), by using six clusters with most gene clusters for each GO.
Before the calculations of functional CNV at species- and community-levels (by seedling station), the values in functional CNV matrices were standardized by dividing by the maximum of each cluster to limit the values in a range between 0 and 1. For each defense response GO, the gene copy number of each species was defined as the sum of the standardized values of all the clusters, and the gene copy number of each seedling station was defined as the averaged gene copy number of all individuals in that station.