STRING and KEGG analyses
Protein-protein interaction networks were analyzed using STRING ver. 11.0 (Szklarczyk et al., 2019) (https://string-db.org/). A list of protein accession numbers consisting of all significantly regulated proteins and mRNA, as well as those found in the up- and down-regulated protein clusters based on Genesis output was entered into the STRING search function. Search results created lists of significantly enriched (FDR<0.05) Uniprot keywords and protein domains classified by Pfam, InterPro, and SMART databases. Further, protein networks with more than one edge and containing at least one significantly regulated protein were identified and searched as a subset of the total group. From this result, in addition to the enriched keywords and domains, sub-networks where identified. An MCL inflation factor of 1.3 was chosen to create seven distinct networks which encompassed all of the significantly regulated proteins connected to at least one other protein. Each individual network protein list was then entered as a STRING search to determine enriched keywords and proteins and classify networks by cellular function.
Over-representation of proteins in known molecular pathways was analyzed using KEGG Mapper search (https://www.genome.jp/kegg/tool/map_pathway1.html). First, the GhostKOALA automatic annotation server (Kanehisa, Sato, & Morishima, 2016) was used to determine matching KEGG numbers to all of the proteins in the dataset. The complete list of matching KEGG numbers was searched in KEGG mapper to establish a baseline for representation for each pathway in the database. Then, lists of significantly up- and down-regulated proteins were searched, and the proportion of representation in pathways (number of proteins in a given pathway divided by total proteins in the list) compared against proportion for the complete DIA assay library protein list. This approach indicated pathways with over-representation of significantly regulated proteins.