RNA extraction
Total RNA was extracted from mesocarp using plant RNAOUT kit (TIANDZ, China), then the quality control of RNA was performed using 2100 Bioanalyzer system and all samples were qualified as a result. RNA samples were subjected to DNaseⅠdigestion (Takara, Japanese) according to instructions from the manufacturer to remove the contaminating genomic DNA. And these were purified using RNeasy MinElute Cleanup Kit (QIAGEN, Germany), and eluted with RNase-free water. Ribo-Zero™ Magnetic Kit (Epicentre, America) was used for depleting rRNA according to the manufacturer’s protocol. Eventually, the rRNA-depleted RNA sample was used in the subsequent experiments.
RNA-Seq library construction and sequencing
Equal amounts of total RNA for each sample was used to construct 24 RNA libraries. RNA sequencing was performed on an Illumina HiSeq 2500 platform at Chinese National Human Genome Center, Shanghai, and 100 bp and 150 bp paired-end reads were generated. In more detail, 24 libraries were generated using NEB Next® UltraTM Directional RNA Library Prep Kit for Illumina (NEB, American), and the quality, quantity and size distribution of RNA were determined using three methods: Qubit fluorometer, 2% agarose gel electrophoresis and Agilent High Sensitivity DNA Kit. The libraries were then used in cluster generation by TruSeq PE Cluster Kit (Illumina, American), and submitted for Illumina HiSeq 2500 sequencing according to the standard operation protocols. All the raw data have been uploaded to the GEO database of NCBI (accession number: GSE122868).
Transcript assembly and novel transcripts annotation
To obtain high quality data, the adapters in raw data were trimmed and the reads with low quality (Q-value < 20) and shorter than 36 bp were filtered using FASTX-Toolkit (version 0.7, http://hannonlab.cshl.edu/fastx_toolkit/). The filtered FASTQ files of pair reads were justified using a custom Python script developed in our laboratory. Then, Q20, Q30 and GC content of the filtered data were calculated using FastQC (version 0.11.5, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to confirm the quality of filtered data. All the subsequent analyses were based on the clean data with high quality. Paired-end clean reads were mapped against Prunus persica L. Batsch reference genome (prunus_persica/genome_v2.0.a1) from the genome database (GDR) (Jung et al., 2008) using Tophat (version 2.1.1) (Trapnell et al., 2009). To generate transcripts for each sample, the above results were assembled using Cufflinks suite (version 2.2.1) (Trapnell et al., 2010; Roberts et al., 2011; Trapnell et al., 2012). Novel transcripts were annotated using BLASTX (Altschul et al., 1990; Altschul et al., 1997) against the NCBI nr database and using Blast2GO (version 5) (Conesa et al., 2005) to perform InterProScan analyses. Then, the novel transcripts were annotated and categorized into 3 principal Gene Ontology (GO) categories: biological process, molecular function and cellular component (The Gene Ontology Consortium et al., 2000; The Gene Ontology Consortium, 2017). Besides, the pathway annotations were generated using SBH method in the KEGG Automatic Annotation Server (KAAS, version 2.1) (Moriya et al., 2007).
Quantification of gene expression levels and differential expression analysis
We used Cuffdiff (Trapnell et al., 2013) to calculate gene expression level as FPKM (fragments per kilobase of exon per million reads mapped) value, and generated the adjusted P-values of false discovery rate (FDR) between different conditions using the Benjamin-Hochberg method (Benjamini and Hochberg, 1995; Benjamini and Yekutieli, 2001). FPKM value of each gene was calculated. Genes with a corrected FDR of 5% and |Log2ratio| ≥ 1 were assigned as differentially expressed genes (DEGs) using R software (version 3.2.5). Wilcoxon signed rank test was used for the comparison of gene expression levels during 7 days between CT and HT conditions.
Gene co-expression network analysis
All transcripts in samples were used to perform co-expression analysis to describe the correlation patterns among genes and identify regulatory modules. Weighted gene co-expression network analysis (WGCNA) were conducted using R software package WGCNA based on expression profiles (Langfelder and Horvath, 2008). Firstly, the pairwise correlation coefficients between all genes were calculated and further transformed into adjacency matrix with a soft threshold power of 12. To counter the effects of spurious or missing connections, topological overlap similarity was measured from the adjacency matrix (Yip and Horvath, 2007). Then, modules were detected as branches of the dendrogram using hierarchical clustering with the DynamicTreeCut algorithm and assigned to different colors(Langfelder et al., 2008). Intramodular connectivity was calculated based on the expression profiles of the module members to identify the highly connected intramodular hub genes. Besides, the correlations between gene expression profile and the phenotype measurements of interest were also calculated to explore the association of modules to external traits. Finally, Cytoscape (version 3.5.1) was used to visualize the network connections and calculate the link weights to indicate the hub-genes in selected modules (Shannon et al., 2003).