RNA Sequencing and Transcriptomic Analysis
Nucleic acids were isolated from frozen cell pellets using a modified CTAB protocol (Possmayer et al. 2011). RNA concentration was determined using a Nanodrop2000 (Thermo-Fisher Scientific) and integrity was assessed with a 2100 Bioanalyzer (Agilent Technologies, USA). RNA library preparation and sequencing were performed by Genome Quebec (Montreal, QC, Canada). Libraries were generated from 250 ng of total RNA. Poly-A mRNA was isolated with the NEBNext Poly(A) mRNA Magnetic Isolation kit (NEB, USA). Reverse transcription was performed with the NEBNext RNA First Strand Synthesis kit (NEB), and second strand synthesis with the NEBNext Ultra Directional RNA Second Strand Synthesis kit (NEB). Libraries were prepared using the NEBNext Ultra II Library Prep Kit for Illumina (NEB) and were sequenced with 100 base paired-end reads on an Illumina HiSeq4000 platform (Illumina, San Diego, USA).
For gene expression analysis, the RNA-Seq reads were mapped to the UWO241 assembled genome (Zhang et al. 2021a) (Accession number GCA_016618255.1) using HISAT2 (Kim, Langmead & Salzberg 2015) and counted against the predicted gene models using HTSeq-count v0.11.3 (Anders, Pyl & Huber 2015). Stringtie v2.1.5 was used to generate expression estimates from the SAM/BAM files created by HISAT2 (Pertea, Kim, Pertea, Leek & Salzberg 2016). Samtools v1.11 was used to read and write Illumina RNA-Seq alignments in the SAM and BAM files. The total number of aligned reads were normalized by gene length and sequencing depth and expressed as Fragments Per Kilobase of transcript per Million mapped reads (FPKM) as a measure of the expression level for each gene. Differentially expressed genes (DEGs) were determined by Ballgown v2.22.0 (Perteaet al. 2016) and edgeR v2.22.0 (Robinson, McCarthy & Smyth 2010). Genes were sorted according to their log2(read counts)-transformed values. The Generally Applicable Gene-set Enrichment (GAGE v2.40.1) package in R (Luo, Friedman, Shedden, Hankenson & Woolf 2009) was used to perform pathway analysis based on genes that were assigned Chlamydomonas Entrez IDs. The parameter “same.dir” in GAGE was set in True and significantly regulated pathways were defined as those enriched sets of genes with a p-value <0.05. To generate the heatmap expression profiles of HSPs, hierarchical clustering using the Euclidean distance method was performed within each subfamily using the ComplexHeatmap R package. Venn diagrams were constructed using an online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/).