Results
Demographic characteristics of the study participants
A total of 100 individuals underwent sequencing, 48 were male (48.0% of the study subjects), and 52 were female (52.0% of the study subjects) with an average age of 25.89 years. Of the total of 100 subjects analyzed, forty cases of SARS‐CoV‐2 infection (40.0%) were identified, with the remaining sixty (60.0%) not having been infected. The analysis of demographic variables shows that the gender of the participants in the study does not have a significant difference between subjects with a history of contracting COVID-19 and those without a history of contracting COVID-19 (P-Value = 0.438). Vaccinated subjects had a significantly higher history of contracting COVID-19, which may have been because these people were more willing to get vaccinated after contracting COVID-19 (P-Value < 0.001). Direct or indirect contact of subjects participating in the study with patients suffering from COVID-19 has significantly caused more people to contract COVID-19; In other words, the frequency of people with a history of COVID-19 is significantly higher in people who had contact with COVID-19 patients (P-Value > 0.0001) (Table 1). The age of the participants in the study is also significantly higher in those with a history of COVID-19, which is consistent with previous studies (P-Value = 0.001) (Table 2).
Table 1: Demographic characteristics of study participants
Table 2: Average age of study participants
The patient’s VCF file analysis
Among the 100 analyzed patients, 40 patients had at least one variant in the five investigated human genes ACE2, TMPRSS2, TYK2, SLC6A20, and IFNAR2, and subsequently 60 patients did not have any variants in the mentioned genes. The result of this analysis shows a total of 140 variants. 35 variants for ACE2 gene, 21 variants for the TMPRSS2 gene, 29 variants for TYK2 gene, 30 variants for SLC6A20 gene, and 25 variants for IFNAR2 gene have been identified. The frequency graph of variants shows that ACE2 gene variants are more common among the study subjects (Diagram 1. a). Not considering duplicate variants, ACE2 gene has 7 unique variants, TMPRSS2 gene has 3 unique variants, TYK2 gene has 4 unique variants, SLC6A20 gene has 4 unique variants, and also IFNAR2 gene has 3 unique variants. Therefore, ACE2 gene is more polymorphic than other genes in this study (Diagram 1. b). It should be noted that in this study, 4 variants were found that were identified for the first time. All variants except one variant are in the category of uncertain significance (VUS) (Table 3).
Diagram 1
Table 3: Variants found in patients
The frequency of variants
The highest population MAF for all variants found in patients is less than 0.01. The highest minor allele frequency in the population was obtained from the 1000 Genomes Project Phase 3, ExAC, and gnomAD databases. A ”rare” variant has a minor allele frequency of less than 1% in all projects reporting frequencies. Table 4 shows the frequency of each variant in patients with and without SARS‐CoV‐2 infection.
Table 4: ACE2, TMPRSS2, TYK2, SLC6A20, and IFNAR2 variants identified in a study of 100 individuals
The ACE2 gene analysis
ACE2 polymorphisms were prevalent in the cohort. In the group of individuals with SARS‐CoV‐2 infection, 28 (51.85%) presented no variant. Seven exonic ACE2 variants were detected in the cohort. The Chi-square test was used to analyze the effect of different ACE2 gene variants on contracting COVID-19. The result of this test shows that there is a significant difference in the frequency of the rs759499720 and rs776459296 variants of the ACE2 gene between people with and without a history of COVID-19 (OR = 7.857, 95% CI: 0.947-94.31; p = 0.034 and OR = 4.714, 95% CI: 1.318-16.971; p = 0.019, respectively). In other words, the rs759499720 and rs776459296 variants of the ACE2 gene has a greater risk of contracting COVID-19.
The TMPRSS2 gene analysis
This gene showed a lower level of polymorphism than ACE2, with 7 patients without SARS‐CoV‐2 infection (11.87%) and 14 with the infection (32.55%) presenting variants. No difference was observed in the distribution of variants between men and women. Three variants were detected. The Chi-square test was used to analyze the effect of different TMPRSS2 gene variants on contracting COVID-19. The result of this test shows that there is a significant difference in the frequency of the rs386818798 variant of the TMPRSS2 gene between people with and without a history of COVID-19 (OR = 3.587, 95% CI: 10.11-1.135; p = 0.025). In other words, the rs386818798 variant of the TMPRSS2 gene has a higher risk of contracting COVID-19. No significant differences were found between individuals with and without SARS‐CoV‐2 infection for the remaining TMPRSS2 variants.
The TYK2 gene analysis
The Chi-square test was used to analyze the effect of different TYK2 gene variants on contracting COVID-19. The result of this test shows that there is a significant difference in the frequency of the rs771922681 and rs753470142 variants of the TYK2 gene between people with and without a history of COVID-19 (OR = 5.500, 95% CI: 1.083-26.95; p = 0.024 and OR = 7.857, 95% CI: 0.975-93.97; p = 0.032, respectively). In other words, the rs771922681 and rs753470142 variants of the TYK2 gene has a higher risk of contracting COVID-19. In addition, there is a significant difference in the frequency of the new diagnosed variant (chr19:10365853:C:A) of the TYK2 gene between people with and without a history of COVID-19 (OR = 6.286, 95% CI: 1.364-30.26; p = 0.012). It means the new variant (chr19:10365853:C:A) of the TYK2 gene has a higher risk of contracting COVID-19.
The SLC6A20 gene analysis
The Chi-square test was used to analyze the effect of different SLC6A20 gene variants on contracting COVID-19. The result of this test shows that the frequency of the rs147760034 and rs139008024variants of the SLC6A20 gene is significantly different between people with and without a history of COVID-19 (OR = 6.875, 95% CI: 1.597-32.63; p = 0.007 and OR = 5.347, 95% CI: 1.055-26.19; p = 0.027, respectively). In other words, the rs147760034 and rs139008024 variants of the SLC6A20 gene has a greater risk of contracting COVID-19.
The IFNAR2 gene analysis
The variant with the least P-Value between other was diagnosed in IFNAR2 gene. The Chi-square test was used to analyze the effect of different IFNAR2 gene variants on contracting COVID-19. The result of this test shows that the frequency of the rs759744926 variant of the IFNAR2 gene is significantly different between people with and without a history of COVID-19 (OR = 6.171, 95% CI: 1.713-21.31; p = 0.003). In other words, the rs759744926 variant of the IFNAR2 gene has a higher risk of contracting COVID-19.
In-silico analysis
PolyPhen-2 and SIFT bioinformatics tools determined that the missense variants rs147760034, rs753470142, TYK2: c.675G>T, and rs759744926 lead to damage in the three-dimensional structure of proteins and also disrupt protein function. And the same goes for continuing bioinformatics tools (Table 5).
Table 5: Results of In-silico analysis of variants
The final prediction of the functional effect of human variants using CADD
Because multiple variant interpretations and scoring tools are available, a widely applicable criterion that accurately and unbiasedly measures and integrates diverse information is needed. A C-score greater than or equal to 10 indicates that these are predicted to be the 10% most deleterious substitutions that can be considered for the human genome, a score greater than or equal to 20 indicates the 1% most deleterious substitutions, and a score greater than or equal to 30 It represents 0.1% of the most destructive substitutions. Table 6 shows that c.884C>T genetic variant of the human IFNAR2 gene is more than 10% of most deleterious substitutions as the most damaging variant for the 3D structure and function of the corresponding protein.
Table 6: The final effect of the variants on the structure and function of the corresponding proteins by CADD
Homology modeling and finding variants of TYK2 residues
The first missense variant of the TYK2 gene is Q225H. This variant changes the amino acid glutamine, which is an uncharged polar amino acid, at position 225 (a total of 1188 amino acids) to the amino acid histidine, which is a positively charged amino acid. Since the amino acid histidine contains an imidazole ring in its side chain; It occupies more three-dimensional space and turns the alpha helix into a beta loop (Figure 1. a). The second missense variant of the TYK2 gene is R465Q. This variant changes the amino acid arginine, which is a positively charged amino acid, at position 465 (a total of 1188 amino acids) to the amino acid glutamine, which is a polar uncharged amino acid. Since the amino acid glutamine contains a shorter side chain than arginine; It occupies less three-dimensional space and converts the beta loop into an alpha helix. In this way, it leads to the extension of the alpha helix and the change of the three-dimensional structure of the protein (Figure 1. b). The third missense variant of the TYK2 gene is R1159S. This variant changes the amino acid arginine, which is a positively charged amino acid, at position 1159 (a total of 1188 amino acids) to the amino acid serine, which is a polar uncharged amino acid. Since the amino acid serine contains a short hydroxymethyl side chain; It occupies less 3D space, resulting in an alpha helix formation in a 3D position close to the remaining variant. Serine is common in many proteins, as seen in the figure below it is present in significant concentrations in the outer regions of soluble proteins due to its hydrophilic nature (Figure 1. c).
Figure 1
Homology modeling and finding variants of SLC6A20 residues
The first missense variant of the SLC6A20 gene is V104I. This variant changes the amino acid valine at position 104 (a total of 593 amino acids) to the amino acid isoleucine, both of them are branched hydrophobic amino acids. This amino acid substitution has shortened both the N-terminal and C-terminal sides of the alpha helix. The isoleucine prefers to be buried in the hydrophobic cores of proteins due to its hydrophobicity. Perhaps the most obvious effect of this is that this amino acid is rarely placed in an alpha helix, although it is easier and even preferred to place in beta sheets. For this reason, the amino end of the alpha helix becomes β turn and the carboxyl end of the alpha helix becomes β loop (Figure 1. d). The second missense variant of the SLC6A20 gene is F249S. This variant changes the amino acid phenylalanine, a hydrophobic amino acid containing an aromatic ring, at position 249 (a total of 593 amino acids) to the amino acid serine, an uncharged polar amino acid. This amino acid substitution has made the C-terminal of the alpha helix longer. Since phenylalanine is hydrophobic, it prefers to be placed in the hydrophobic cores of proteins. The presence of an aromatic side chain can also mean that phenylalanine is involved in interactions with other aromatic side chains. For this reason, replacing phenylalanine 249 with serine has disturbed the interaction of this amino acid with phenylalanine 250, and on the other hand, the relatively reactive hydroxyl group of serine amino acid has enabled this amino acid to form hydrogen bonds with various polar substrates; As a result, it leads to the transformation of the beta loop into an alpha helix (Figure 1. e).
Homology modeling and finding variants of IFNAR2 residues
The missense variant of the IFNAR2 gene is P295L. This variant replaces the amino acid proline at position 295 (a total of 516 amino acids) with the amino acid leucine, which is a branched hydrophobic amino acid. Proline is unique in that it is the only amino acid whose side chain is attached twice to the protein skeleton, forming a pentagonal nitrogen-containing ring. More precisely, this property makes proline an imino acid. This difference is very important; because proline cannot participate in many of the main chain connections that are easily established by other amino acids. For this reason, proline can often be found in very tight β-turns and β-loops in protein structures (i.e., where the polypeptide chain must change direction). Functionally, proline plays an important role in molecular recognition, especially in intracellular signaling. Similar to what is observed in the IFNAR2 signaling pathway. Domains such as SH3 bind to specific proline-containing peptides that are key parts of many signaling cascades. Leucine contains a very unreactive side chain, so it is rarely directly involved in protein function. For this reason, by replacing proline with an active role in signaling cascades with leucine, the role of IFNAR2 as part of the signaling pathway of the immune system is disrupted (Figure 1. f).