Material and Methods
Sample selection
In this study, 100 Iranian subjects were selected from among the patients referred to the genetics laboratory to carry out investigations to determine the etiology of various non-infectious genetic diseases. The participants have been genotyped and followed up for COVID-19 genetic risk factors for 2 years (2021-2023). Individuals under the age of 18 are one of the exclusion criteria. Information about SARS‐CoV‐2 infection was collected from a group of 100 selected individuals via a questionnaire administered to patients 34.
Genotype analysis
Blood samples taken from patients were washed with lysis buffer so that RBCs were separated. Then, genomic DNA was extracted from the WBCs through the salting-out method and the extracted DNA samples were stored at -20°C until analysis. To assess the purity of the extracted DNA, the Optical Density (OD) of the samples was measured by spectrophotometry in a nanodrop device. The whole exome of 100 participants in this project was sequenced using the Illumina HiSeq 2500 platform with an average coverage of 50X 35. Informed written consent had been acquired from all participants. The study was approved by the ethics committee of the Kerman University of Medical Science. To have a brief review, after checking the read quality by measuring Quality Control (QC) score through the Phred scale, the raw reads aligned to the human reference genome assembly (GRCh38) using BWA. As well, VCF files of multi-sample were generated utilizing the GATK tool. All ACE2, TMPRSS2, TYK2, SLC6A20, and IFNAR2 variants were extracted for further analysis. Variants were annotated using the dbSNP, ClinVar, Varsome, and Franklin databases, which include population-oriented data on nucleotide and amino acid sequence changes. The highest population minor allele frequency (MAF) of all variants was checked to be less than 1%.
In-Silico analysis
Various bioinformatics software for molecular dynamics simulation has been used to determine the effect of the genetic variant on the amino acid sequence, including determining the effect of the variant on the primary transcripts of gene and alternative transcripts, as well as the potential effect of the variant on the function and tertiary structure of the protein. Data are also provided on Polyphen-2, SIFT, MutationTaster, FATHMM-MKL, and CADD scores.
Protein modeling
Most methods generate models interactively based on the user requests; For example, I-TASSER. Here, homology modeling was applied by the I-TASSER server to create the 3D structure of the trimeric studied protein that can calculate the effect of genetic variants on protein structure and stability. All files in PDB (Protein Data Bank) format were obtained from the I-TASSER server. UCSF ChimeraX was applied for the graphical visualization of molecules. The matchmaker tool was chosen to superimpose related structures without worrying about numbering or missing residues. This tool superimposes proteins by creating an alignment and then matches the aligned residues to the 3D structure. Also, all figures of 3D structures and alignments have been assembled with the UCSF ChimeraX software.
Statistical analysis
Statistical analysis was performed using SPSS version 26.0 software and GraphPad Prism 9.4 software was used to draw graphs. The Chi-Square statistical test was used to evaluate the association between different variants of human genes ACE2, TMPRSS2, TYK2, SLC6A20, and IFNAR2 with the severity and incidence rate of COVID-19. It should be noted that in all analyses the group without variants was considered as the reference group for calculating the Odds Ratio (OR). P-Value < 0.05 was considered statistically significant.