Identification of HSP genes in green algal genomes
The UWO241 genome (Zhang et al. 2021a) was screened for the
presence of HSP genes using C. reinhardtii HSP protein sequences
(Phytozome v12.1) and conserved domains typical for HSPs from the Pfam
database (Mistry et al. 2021) as queries. Putative HSP genes in
UWO241 were identified through a tBLASTn search
(e-value<e-10, bit-score>100).
The results were manually inspected for redundant sequences and to
ensure correct gene structure annotation. The presence of conserved HSP
domains was confirmed using Pfam and NCBI Conserved Domain Database (Luet al. 2020). The gene names are based on the closest C.
reinhardtii homologs and multiple homologous sequences were numbered in
order of discovery (e.g., HSP70A-1). Only genes supported by
transcriptomic data are reported. The genomes of other green algae were
obtained from GenBank (Chlamydomonas. sp. ICE-L (Zhang et
al. 2020); Chlamydomonas eustigma (Hirooka et al.2017); Gonium pectorale (Hanschen et al. 2016);Chlorella sorokiniana (Arriola et al. 2018)) or Phytozome
V12.0 (Dunaliella salina (Polle et al. 2017); Volvox
carteri (Prochnik et al. 2010); Coccomyxa subellipsoidea(Blanc et al. 2012)) and similarly screened. Only full-length
genes were used in downstream analyses. Multiple sequence alignments
were performed using ClustalW (Sievers et al. 2011) implemented
through Geneious Prime (Biomatters Ltd, Auckland, New Zealand).
Cladograms and phylogenetic trees were inferred based on protein
alignments using FastTree v.2.1 with the Neighbor-Joining method and the
Jukes-Cantor genetics distance model (Price, Dehal & Arkin 2010), and
annotated in iTOL v6 (https://itol.embl.de/). The bootstrap values for
each branch reflect the percentage on 1,000 replicate trees. Subcellular
localization was predicted by four independent software: TargetP2.0
(Armenteros et al. 2019), Predotar (Small, Peeters, Legeai &
Lurin 2004), WoLF PSORT (Horton et al. 2007) and LOCALIZER
(Sperschneider et al. 2017).