Identification of HSP genes in green algal genomes
The UWO241 genome (Zhang et al. 2021a) was screened for the presence of HSP genes using C. reinhardtii HSP protein sequences (Phytozome v12.1) and conserved domains typical for HSPs from the Pfam database (Mistry et al. 2021) as queries. Putative HSP genes in UWO241 were identified through a tBLASTn search (e-value<e-10, bit-score>100). The results were manually inspected for redundant sequences and to ensure correct gene structure annotation. The presence of conserved HSP domains was confirmed using Pfam and NCBI Conserved Domain Database (Luet al. 2020). The gene names are based on the closest C. reinhardtii homologs and multiple homologous sequences were numbered in order of discovery (e.g., HSP70A-1). Only genes supported by transcriptomic data are reported. The genomes of other green algae were obtained from GenBank (Chlamydomonas. sp. ICE-L (Zhang et al. 2020); Chlamydomonas eustigma (Hirooka et al.2017); Gonium pectorale (Hanschen et al. 2016);Chlorella sorokiniana (Arriola et al. 2018)) or Phytozome V12.0 (Dunaliella salina (Polle et al. 2017); Volvox carteri (Prochnik et al. 2010); Coccomyxa subellipsoidea(Blanc et al. 2012)) and similarly screened. Only full-length genes were used in downstream analyses. Multiple sequence alignments were performed using ClustalW (Sievers et al. 2011) implemented through Geneious Prime (Biomatters Ltd, Auckland, New Zealand). Cladograms and phylogenetic trees were inferred based on protein alignments using FastTree v.2.1 with the Neighbor-Joining method and the Jukes-Cantor genetics distance model (Price, Dehal & Arkin 2010), and annotated in iTOL v6 (https://itol.embl.de/). The bootstrap values for each branch reflect the percentage on 1,000 replicate trees. Subcellular localization was predicted by four independent software: TargetP2.0 (Armenteros et al. 2019), Predotar (Small, Peeters, Legeai & Lurin 2004), WoLF PSORT (Horton et al. 2007) and LOCALIZER (Sperschneider et al. 2017).