Abstract
Advances in single-cell level analytical techniques, especially cytometric approaches, have led to profound innovation in biomedical research, particularly in the field of clinical immunology. This has resulted in an expansion of high-dimensional data, posing great challenges for comprehensive and unbiased analysis. Conventional manual analysis is thus becoming untenable to handle these challenges. Furthermore, most newly developed computational methods lack flexibility and interoperability, hampering their accessibility and usability. Here, we adapted Seurat, an R package originally developed for single-cell RNA sequencing (scRNA-seq) analysis, for high-dimensional flow cytometric data analysis. Based on a 20-marker antibody panel and analyses of T cell profiles in both adult blood and cord blood, we showcased the robust capacity of Seurat in flow cytometric data analysis, which was further validated by Spectre, another high-dimensional cytometric data analysis package, and conventional manual analysis. Importantly, we identified a unique CD8+ T cell population defined as CD8+CD45RA+CD27+CD161+T cell, that was predominantly present in cord blood. We characterized its IFN-γ-producing and potential cytotoxic properties using flow cytometry experiments and scRNA-seq analysis from a published dataset. Collectively, we identified a unique human cord blood CD8+CD45RA+CD27+CD161+T cell subset and demonstrated that Seurat, a widely used package for scRNA-seq analysis, possesses great potential to be repurposed for cytometric data analysis. This facilitates an unbiased and thorough interpretation of complicated high-dimensional data using a single analytical pipeline and opens a novel avenue for data-driven investigation in clinical immunology.
Introduction
The rapid development of analytical technologies at a single-cell level over recent decades has revolutionized biological and medical research, particularly in the field of immunology. Immune cell populations are well-known for their heterogeneity and tools such as flow cytometry (or fluorescence-activated cell sorting, FACS), cytometry by time-of-flight mass spectrometry (CyTOF) and single-cell RNA-sequencing (scRNA-seq), facilitate an in-depth identification and characterization of various immune cell types (1). Conventionally, analysis of cytometric data (including flow, spectral and mass cytometry) has relied on manual analysis based on empirical gating strategies under expert supervision. This is extremely labor-intensive and tedious, as the complex cytometric data is limited to permutational visualization of two-dimensional (2D) plots (FACS plots). These plots feature different pairs of marker combinations, which requires arduous sequential inspection (2). The possible combinations of markers from a given panel increase exponentially with the addition of extra parameters. As more and more state-of-the-art cytometric panels exceed 20 markers (3,4), thorough manual gating analysis is becoming increasingly challenging and impractical (5). Furthermore, such analytical workflows are inevitably subject to bias, considering their dependence on empirical knowledge and subjective selection and inspection of markers. These limitations hamper analyses and potentially conceal novel findings.
Various computational approaches have been developed as potential solutions, including methods for dimension reduction (such as t-distributed stochastic neighbor embedding, tSNE (6), and uniform manifold approximation and projection, UMAP (7)), clustering (such as PhenoGraph, and FlowSOM (8,9)), and automated cell gating and classification (5,10). These tools all accelerate high-dimensional data analysis. Moreover, they have revolutionized cytometry-based research, transitioning from the conventional hypothesis-driven strategy that focuses on specific cell types or markers, to more unbiased and comprehensive methods, that simultaneously take all data into account. Despite their notable success, these tools still suffer significant limitations. For example, many of these computational modalities are separate, and some even require specific data formatting and processing procedures. This is not user-friendly and hampers their usability and accessibility in the broader research community. While some integrative toolkits combine these modalities and offer end-to-end analysis of cross-platform cytometric data including normalization, integration and clustering, such as Spectre (10), and ImmPort Galaxy (11), these are still few in number. There are also some other commercial toolkits of this kind like OMIQ (12,13) and Cytobank (14), but they usually require paid subscriptions, limiting their availability. Furthermore, owing to their non-open-source nature, commercial toolkits can lag in flexible customizing services, as well as community-driven support, maintenance and improvements, potentially preventing optimal usability and adaptability. Hence, there is great interest in more accessible and adaptable tools.
Computational analyses of complex cytometric data have considerably benefited clinical immunological research. On one hand, clinical immunological data is notorious for its heterogeneity, highlighting the need for computational tools for data cleaning, batch alignment, and unbiased analysis. On the other hand, given the limited availability of clinical samples (such as samples with rare disease backgrounds or longitudinal samples), expanding the markers and dimensionality of cytometry panels may help to achieve more comprehensive and efficient investigation, and represents an unprecedented opportunity for high-dimensional data analysis. Leveraging the rapidly developing computational analytical tools for clinical immunological studies is thus emerging as a promising avenue to provide more detailed insights into clinical contexts whilst maximizing the values of limited clinical samples.
An area of increasing interest is the characterization of immune profiles in umbilical cord blood (CB) compared to adult blood (AB). The striking immunological differences between CB and AB not only offer critical insights for disease pathogenesis but also provide an ideal scenario for showcasing the analytical power of computational toolkits in clinical applications.
The main components of the CB immune compartment are cord blood mononuclear cells (CBMCs), known to exhibit unique characteristics relative to peripheral blood mononuclear cells (PBMCs) from AB, due to the semi-allogeneic environment of pregnancy. Mirroring the fetal immune system (15), CBMCs feature a more naïve phenotype (16–18), and are implicated in the physiology and pathology during both pregnancy and later in life (19–22). Hence, understanding CB immune profiles and their differences from AB provides precious insights into immunological development at different stages, as well as sheds light on the complex immunobiology of pregnancy.
Here, we developed a 20-marker antibody panel for thoroughly immunophenotyping T cells in both CB and AB. We adapted Seurat, a widely used end-to-end package originally for scRNA-seq analysis, for the resulting high-dimensional flow cytometric data analysis after primary processing in FlowJo. This workflow identified several previously underappreciated T cell subsets in AB, validated by Spectre, another computational cytometric analytical package, and conventional manual gating, showcasing the capacity of Seurat. Importantly, using Seurat for a comparative study of CB and AB profiles, we revealed a unique CB T cell population, characterized as CD8+CD45RA+CD27+CD161+T cells. Analysis of previously published scRNA-seq data confirmed this identified population and hinted at its possible cytotoxic and pro-inflammatory properties. Together, this represents the first application example of using Seurat as a complete flow cytometric analysis workflow and demonstrates its robust analytical performance. It emerges as a simple and easy-to-use toolkit for cytometric data analysis, particularly for its pre-existing wide scRNA-seq user community. Seurat also features as a single platform but with various supplementary tools and plugins facilitating single-cell analysis of both protein and RNA data, as well as their comparisons and cross-validation. This represents a novel unbiased discovery tool for complex single-cell data analysis in clinical immunology.
Materials and Methods
Sample preparation and flow cytometry experiments
PBMCs from adult blood samples and CBMCs from cord blood samples were prepared as described in Supplementary Information 1.1. For flow cytometry experiments, cells were processed, barcoded, and stained as Supplementary 1.2 – 1.3. and analyzed using the 5-laser Aurora Spectral cytometer (Cytek Biosciences, USA) on the same day.
Cytometric data was unmixed with SpectroFlo and analyzed using FlowJo (BD Life Science) based on the gating strategy in Supplementary Figure 1A. The CD3+CD4+ and CD3+CD8+ T cell populations were manually gated and exported as CSV files with their scaled values for further computational analysis in RStudio using Seurat and Spectre.
High-dimensional flow cytometry data analysis
Data preparation
For analysis using Seurat 4.3.0 and Spectre 1.0.0, the exported CSV files of the gated populations of interest were loaded in RStudio (v4.1.2). Next, data tables for each sample were merged together using Spectre’s do.merge.files() function. The data then underwent an hyperbolic arcsine (arcsinh) transformation (co-factor = 2000) using Spectre (do.asinh ). To avoid bias and skewing due to different cellularity, the processed cytometric data was first randomly downsampled to the same number among samples before analysis.
Analysis using Seurat
After reading in the downsampled data matrix generated in 2.2.1., along the conventional Seurat analysis workflow (23–26), procedures such as QC and normalization originally for transcriptomic data were skipped. Furthermore, to preserve the transformed flow cytometry data structure for analysis, the data scaling process was bypassed by selecting “do.scale = FALSE, do.center = FALSE” in the ScaleData()function.. All features were selected as variable features for further analysis.
After that, principal component analysis (PCA) was performed. Based on the PCA scores, the top PCs contributing to 99% of variance were selected for the subsequent cluster analysis using theFindNeighbors() and FindClusters() functions and dimensional reduction with the RunUMAP() function. We utilized theDotPlot() and VlnPlot() functions in Seurat to visualize the expression of various markers in different clusters.
A detailed workflow for analyzing flow cytometry data using Seurat is available in the Supplementary Information 1.4.2.
Analysis using Spectre
For analysis with Spectre, input data were first down sampled and then analyzed following (10), except a 5×5 self-organizing map (SOM) was used. The final clustering numbers were also adjusted to the same as Seurat’s results for comparison.
Single-cell RNA sequencing (scRNA-seq) analysis
Preprocessed single-cell RNA sequencing (scRNA-seq) data was downloaded from Gene Expression Omnibus from a previous study (27) (GEO accession: GSE158493). Since the original dataset was grouped based on sample origins (fetal spleen, full-term umbilical cord blood and adult peripheral blood), the LIGER package was used to re-integrate the data with different origins to eliminate potential batch effects (28). After that, the data were analyzed with Seurat 4.3.0. (23–26). Marker genes were identified using Seurat’s FindMarkers() function based on its default settings, with thresholds for differentially expressed genes defined as p < 0.05 and LOG2(Fold change) higher than 0.5 or lower than -0.5. Gene ontology (GO) analysis and gene set enrichment analysis (GSEA) was run with package fgsea and ClusterProfiler following their tutorials (29,30).
Statistics
Statistical analysis was performed with GraphPad PRISM 10 or in RStudio using Seurat. Mann-Whitney U test was used for comparison between two groups. Differences were considered to be statistically significant when p < 0.05.
Results
Seurat is a reliable tool for high-dimensional flow cytometry data analysis
Analysis of human peripheral blood, and the heterogenous T cell compartments therein, is of great importance considering the availability of blood samples and its ubiquitous use in medical research and clinical diagnosis. Here, we established a 20-marker antibody panel (Table 1) to comprehensively characterize the functional profiles of CD4+ and CD8+ T cells in human peripheral blood mononuclear cells (PBMCs). Details about the markers and their functions are described in Table 1. They cover various aspects of T cell biology, including maturation, activation, migration, and function.
PBMCs from healthy adult individuals were collected and processed based on the workflow in Figure 1A. To reduce antibody wastage as well as minimize the inter-sample variations caused by batch effects introduced during experimental processes, which is an intrinsic problem for clinical studies involving flow cytometry, we utilized a barcoding system leveraging the anti-CD45 labelling. In the current study, PBMC samples were stained with our 20-colour antibody panel and analyzed by spectral cytometry. After demultiplexing, the resulting cytometric data were manually gated for CD4+ and CD8+ T cell populations (Supplementary Figure 1A).
For proof-of-concept, the CD8+ T cell data was analyzed with Seurat and compared with Spectre, based on the 12 functional markers out of the total 15 surface markers, excluding lineage markers such as CD45, CD3 and CD8. The results were validated through manual gating as well.
Seurat clustered the adult CD8+ T cell compartment into 14 clusters (Se1-14) and projected them onto the 2D UMAP plot as shown in Figure 1B. Historically, T cells are gated into four populations based on expression of CD45RA and CD27. The CD45RA+CD27+ population is defined as naïve, the CD45RA-CD27+population as central memory (CM), the CD45RA-CD27- population as effector memory (EM) and the CD45RA+CD27- is effector memory that re-express CD45RA (EMRA) (31). Comparing Seurat’s results to this classification, we found that Se1, Se2 and Se8 were naïve, Se4, Se5 and Se7 were CM, Se11 and Se13 were EM, and Se12 was EMRA. Interestingly, Seurat clustering retrieved populations exhibiting intermediate expression levels of CD45RA and CD27, such as clusters Se3, Se6, Se9, Se10 and Se14 (Figure 1C-D), which did not necessarily fall into any of the four conventional gates mentioned above. This implies that the conventional gating strategy based only on positive or negative expression of markers is incomplete. Seurat, on the other hand, can provide a more comprehensive analysis for multiple markers, and thus enable the discovery of previously unidentified subsets.
Feature markers identified by Seurat and the potential functional properties of each cluster are summarized in Supplementary Table 3. Clusters with naïve phenotypes (Se1, Se2 and Se8) expressed different levels of integrin β7, indicating differential gut-homing potentials (32). Two CM clusters, Se4 and Se7, were also high in integrin β7, however, they could be separated based on CXCR-5 and CD161 expression, as markers for follicular T cells and cytotoxic T cells, respectively. The CM cluster Se5 uniquely featured the skin-homing marker, CLA. Cluster Se6 and Se9 both expressed intermediate levels of CD161, but Se6 also had intermediate expression of integrin β7 while Se9 did not express integrin β7 at all. Se3 exhibited a similar profile to Se6, except for the absence of CD161. Additionally, their intermediate expression of CD45RA implied that they might represent the transitional state between EM and EMRA (33). For EM-like and EMRA-like clusters, Se13 and Se12, but not Se11, they expressed high levels of integrin β7. Finally, Se14 was characterized by its high level of LAG-3, a marker for early T cell activation, while Se10 might be a transitional population during the T cell activation and maturation, considering its intermediate expression of both CD45RA and CD27. Other markers in our panel including CD49b and FCER1A were expressed at relatively low levels in T cells in the absence of stimulation. These subtle differences could still be detected by Seurat and confirmed with manual analysis (Figure 1C and Supplementary Figure 2A-B). For example, CD49b, a collagen-binding integrin, showed the highest level in the skin-homing population Se5, while FCER1A expression inversely correlated with CD27 expression, as exemplified in Se3, Se6, Se9, Se11 and Se12. Considering their relatively low level of expression, further research is required to evaluate whether such mild differences are of biological significance.
In parallel, we applied Spectre’s workflow to analyze our dataset, manually defining 14 clusters as the final clustering output to cross-validate the findings from Seurat. Comparison of the Seurat and Spectre results revealed that their outcomes were similar, with 11 of the 14 clusters being identified by both approaches, displaying comparable cluster sizes and feature marker expression (Figure 1C, Figure 1E and Supplementary Figure 2C). This indicated the robust and reliable performance of Seurat. As for their discrepancies, the Sp2 identified by Spectre was divided into three distinct clusters by Seurat, Se1, Se2 and Se8 (Supplementary Figure 2D). In contrast, Se4 in Seurat analysis was split into Sp1 and Sp4 by Spectre (Supplementary Figure 2D). In an independent comparative study, we also found that most cells were grouped into the same cluster by both methods, with minor discrepancies (Supplementary Figure 3). Together, these results indicated that clustering analyses from Seurat and Spectre were comparable and since they adopted different algorithms for their analyses, some differences were potentially to be expected.
Importantly, we further validated the results from Seurat with manual gating. As shown in Supplementary Figure 4, all 14 Seurat clusters could be gated out according to their marker expression on 2D FACS plots. Projection of Seurat and Spectre clustering results on the 2D FACS plots showed similar distributions compared to the manual gating results (Supplementary Figure 5-7). Furthermore, the proportion of each population in the total CD8+ T cells were also comparable between manual gating and Seurat clustering (Figure 1F).
Finally, we similarly applied the workflow to the gated CD4+ T cells and retrieved 15 clusters with Seurat (Supplementary Figure 8A-B). These subsets could also be similarly confirmed by manual gating and cross-validated with Spectre, which identified 12 out of the 15 subsets obtained by Seurat (Supplementary Figure 8C-D).
Together, these results demonstrated that Seurat, a tool originally developed for scRNA-seq data analysis, is also applicable and robust for high-dimensional flow cytometric data analysis. Its analysis helps to characterize the CD8+ T cells in PBMCs in more detail, retrieving novel T cell sub-clusters.
Comparative profiling of CD8+ T cells in adult blood and cord blood
Human circulatory T cells are plastic across the human lifespan and might be linked to differential disease susceptibility across human lives (34). We next aimed to compare the profiles of the T cell compartments from the cord blood (CB) and adult blood (AB) using Seurat.
As shown in Figure 2, peripheral blood mononuclear cells (PBMCs) from AB and cord blood mononuclear cells (CBMCs) from CB were first analyzed based on our 20-parameter antibody panel with manual gating. Next, the high-dimensional data were analyzed by Seurat and Spectre (Figure 3-4).
The proportion of CD4+ and CD8+ T cells among total lymphocytes showed no difference in PBMCs and CBMCs (Supplementary Figure 1B). As previously shown (16) , there were more naïve T cells and fewer EM T cells in both CD4+ and CD8+ compartments from CBMCs compared with PBMCs (Figure 2B-D). Reflective of a mature phenotype, adult PBMCs had higher proportions of CD4+ CM T cells and CD8+ EMRA T cells (Figure 2C-D). Our comprehensive 20-parameter antibody panel enabled an in-depth characterization of the functional status of T cells.
As shown in Figure 2E, CD4+ T cells in adult PBMCs had higher proportions of cells expressing T-bet, CLA and CXCR-5, while more CD4+ T cells from CB expressed CD27, β7, and LAG-3. Similarly, CB CD8+ T cells had a higher expression of CD27, β7, alongside FoxP3, whilst adult CD8+ T cells had higher proportions of cells expressing CD161, CLA, CXCR-5 and T-bet (Figure 2F and Supplementary Figure 9A-B). These findings are consistent with previous reports of the higher gut-homing potential of CBMCs, whilst PBMCs are more likely to migrate to the skin (16,34–36). Moreover, the lower expression of T-bet and overall naïve-biased phenotype of CB T cells coincides with their reduced IFN-γ, IL-4 and IL-13 production compared to AB T cells (Supplementary Figure 9C-D) (37).
Analysis of adult blood and cord blood CD8+T cells with Seurat identifies a unique cord blood CD8+CD45RA+CD27+CD161+T cell subset
Conventional manual gating workflows are empirical and subject to bias. Such analysis is less likely to potentially reveal new cell populations. Focusing on the CD8+ T cell compartment, we therefore performed unsupervised principal component analysis (PCA) on the AB and CB combined dataset based on expression levels quantified by the expression levels of the 12 surface markers used for clustering (Figure 3A). As expected, PBMC samples were distinctly separated from CBMC samples along the first PC (PC1, accounting for 18.0% of the variance), which was consistent with the differential expression of various T cell functional markers in AB samples relative to CB samples (Figure 2E-F).
Next, Seurat was used for more in-depth analyses, identifying 13 sub-populations from the combined AB and CB dataset (Figure 3B). Seurat clustering distinguished three naïve (clusters 1, 2 and 8), three CM (clusters 3, 4 and 7), one EM (cluster 11), and two EMRA clusters (clusters 5 and 12), with the remaining clusters exhibiting intermediate expression levels of CD45RA and/or CD27 (clusters 6, 9, 10, 13) (Figure 3C-D). Details about feature markers and potential functional properties of the identified CD8+ T cell subsets are summarized in Supplementary Table 4. Clusters 1 and 2 shared a naïve phenotype but differed in their expression of integrin β7 and gut-homing potential, consistent with previous findings demonstrating an increased expression of gut-homing receptors in naïve CB lymphocytes (38). The naïve cluster 8 was also high in integrin β7 but additionally expressed CD161, suggesting a cytotoxic phenotype. Out of the three CM subsets identified by Seurat, Clusters 3 and 7 of the CM class both had high integrin β7 expression, indicative of a migratory preference to the gut. Cluster 7 differed from Cluster 3 through its low expression of CD45RA and high expression of the follicular T cell marker, CXCR-5. Opposed to gut-homing, the CM cluster 4 did not express integrin β7 but instead highly expressed the skin-homing marker CLA and integrin CD49b. Integrin β7 levels could also distinguish the EMRA-like clusters 5 versus 12 as well as clusters 6 versus 9, with clusters 6 and 9 also expressing CD161. Finally, cluster 13 featured high levels of LAG-3 and CXCR-5.
We again compared the analyses from Seurat and Spectre based on the combined AB and CB dataset. Both methods identified 13 clusters, of which 11 were commonly shared, with comparable feature markers (Supplementary Figure 9E-F). Similarly, Seurat clusters in both AB and CB could also be validated through manual gating (Supplementary Figure 10 and Supplementary Figure 11). These results again highlighted the robust performance of Seurat for analyzing high-dimensional cytometric data.
Striking differences in the abundance of the Seurat clusters were found comparing AB and CB samples (Figure 3B and Supplementary Figure 12A). The naïve cluster 1 was predominantly abundant in AB whilst cluster 2 consisted primarily of CB cells. Seurat cluster analysis was able to separate these two naïve populations based on the differential expression of integrin β7 and CD27, which were enriched in the naïve population from CB (cluster 2) (16,38). Consistent with previous studies comparing adult and cord blood, CD8+ effector memory populations were almost exclusively found in AB (39). This includes all the identified EM (cluster 11) and EMRA (clusters 5 and 12) subsets, in addition to clusters 6, 9 and 10. Unlike these EM subsets, CM populations were present in both AB and CB. Of note, cluster 3 was dominant in CB whilst clusters 4 and 7 were enriched in AB. Finally, the LAG-3+ cluster 13 was equally abundant in both AB and CB.
Moreover, we also documented differential expression of various markers between the clusters from AB and CB. Reflective of their naïve phenotype and lack of antigenic exposure, several CB CD8+ T cell populations (clusters 1, 2, 3, 4, 7, 8, 9) exhibit higher expression of CD27 compared to their adult equivalents (Figure 3E). In contrast, the expression of CXCR-5 in clusters 7 and 13 alongside CLA expression in cluster 4 was higher in adult CD8+ T cells compared to CB (Figure 3E).
Intriguingly, our clustering analysis identified a population (cluster 8) that is almost exclusive to CBMCs (7.11% of CD8+ T cells in CB versus 0.19% in AB) (Figure 4A-C). This cluster is characterized as CD8+CD45RA+CD27+CD161+, partly overlapping with a previously reported but not fully characterized, CD8+CD161+ T cell population found in CB (40,41). Analysis based on our 20-parameter panel provided an unprecedented functional overview of the CD8+CD45RA+CD27+CD161+T cell subset. We discovered that this newly identified population had high integrin β7 expression (Figure 4B and Figure 4D) as well as higher CLA, BCL-6, T-bet, and GATA-3, but lower FoxP3 levels compared to its CD161- counterpart (Figure 4E). No differences in LAG-3, CXCR-5, Nur77, CD137 and FCER1A were found between the CD8+CD45RA+CD27+CD161+and CD8+CD45RA+CD27+CD161-T cells (Figure 4E). Upon stimulation with PMA and ionomycin, this subset predominantly produced IFN-γ and IL-4 but lowly expressed IL-5, IL-10, and IL-13. It only differed significantly in IL-10 production relative to the CD161- counterpart (Figure 4F). Based on these results, the CD8+CD45RA+CD27+CD161+sub-population (cluster 8) appeared to be a pro-inflammatory and cytotoxic T cell subset.
In summary, using our high-dimensional antibody panel and Seurat analysis, we thoroughly profiled the CD8+ T cell compartment in AB versus CB. This revealed a unique CD8+CD45RA+CD27+CD161+T cell subset in CB which we characterized.
Cross-validation and further characterization of the CD8+CD45RA+CD27+CD161+T cell subset using scRNA-seq.
To validate our findings and further characterize the population identified by Seurat in CBMCs, we leveraged a recently published scRNA-seq dataset for naïve CD8+ T cells, which analyzed 18513 cells across different developmental stages and compartments (27). This dataset covers naïve T cells from fetal spleen, umbilical cord blood and adult peripheral blood. As shown in Figure 5A and Supplementary Figure 13A, the overall naïve CD8+ T cell population (sorted as CD8+CD45RA+CD27+CCR7+CD95-) was further clustered into four subsets. Cluster0 expressed high levels of RGS1 , which is linked to T cell exhaustion (42); cluster1 was high in IL7R and SELL , indicative of a naïve phenotype; and cluster2 was marked by MT2A and RPS4Y1, which might be potentially linked to a memory T cell phenotype (43,44). Intriguingly, cluster3 exhibited similar features to the population we identified, characterized by its expression of KLRB1 (gene encoding CD161) (Figure 5B). Differentially expressed gene analysis with DESeq2 (45) found 93 genes upregulated in cluster3 compared with the remaining naïve CD8+ T cells, while 2 genes were downregulated (Figure 5C). Among the upregulated genes in cluter3, there were several related to cytotoxic T cell features, such as GZMA , GZMK ,GZMM , CST7 , and NKG7 (46), indicating their cytotoxic functions, which is similar to our cytometric results and the previous reports for CD8+ CD161+ T cells (40). Additional inflammation-related genes were upregulated, such as ID1 (47) and CMC1 (48). Interestingly, CCL5 , related to a memory phenotype, was also significantly upregulated in cluster3. Additionally, several chemokine receptors were also increased, such as CXCR3 and CXCR4 .
GO analysis was next carried out with the differentially expressed genes using ClusterProfiler (29,30) based on the Gene Ontology pathways (49). As shown in Figure 5D, compared with other naïve CD8+T cells, cluster3 was enriched for pathways related to T cell chemotaxis , lymphocyte chemotaxis and T cell migration , consistent with the differentially expressed gene analysis (Figure 5C) and our flow cytometric data (Figure 3I and 3J). On the other hand, this population was suppressed in pathways related to ribosomes.
The GO analyses were based only on the differentially expressed genes, and we next conducted GSEA based on the overall transcriptomic data using the fgsea package (50). GSEA using the Hallmark gene setshowed that cluster3 displayed enrichment in inflammatory pathways such as TNFα signaling via NFκB, IL2-STAT5 signaling, and IFNγ response (Figure 5E). Additionally, they also showed enrichment in theMAPK signaling pathway and the T cell receptor signaling pathway (Figure 5F-G).
In summary, scRNA-seq analysis from an independent dataset cross-validated the CD8+CD45RA+CD27+CD161+population identified by our Seurat-based analysis. It also characterized it as a naïve subset but with a potential pro-inflammatory and cytotoxic profile.
Discussion
There has been a significant expansion of both the size and complexity of cytometric data, especially in the field of clinical immunology. Such high-dimensional and complicated datasets cause great difficulties for conventional manual analytical strategies, inevitably hampering comprehensive and unbiased analyses and interpretation. Consequently, myriad computational toolkits have been developed, aiming to address these challenges, but their effective applications are sometimes restricted, suffering from a lack of flexibility and interoperability. Recently, packages such as Spectre (10) and tidyof (51) were developed, attempting to provide integrative, end-to-end services for cytometric data analysis. However, they have only been sporadically applied, due to them being standalone pipelines, requiring adaptation to completely new packages, and demanding significant prerequisite coding knowledge.
In the present study, we repurposed Seurat, a well-established package for scRNA-seq data analysis, for high-dimensional flow cytometric data analysis (23–26). Comparison of Seurat and other currently available analytical packages are summarized in Supplementary Table 5. Among them, Seurat has long shared great popularity within the field of single-cell analysis. It is community-driven and well-supported and has more than 20 R packages for related data processing and analysis. Therefore, it is likely to be more accessible and easier to use, particularly for the broad users with previous experiences in scRNA-seq looking to flow cytometry to complement their investigative breadth.
Here, we showcased the robust capacity of Seurat, based on our experiments profiling the T cell compartments in adult blood (AB) and cord blood (CB). Overall, Seurat generated similar results to Spectre, which were also confirmed by manual gating. Importantly, with our approach, we identified a unique T cell subset (CD8+CD45RA+CD27+CD161+) within CB and cross-validated its functional profiles with an independent scRNA-seq dataset using Seurat. Together, these data highlight the great potential of Seurat for cytometric data analysis. It represents a simple single platform for the unbiased analysis of both protein and RNA data at single-cell resolution. This will enable simpler comparison and cross-validation of cytometric and scRNA-seq studies and facilitate more comprehensive investigations and discoveries in clinical immunology.
A plethora of state-of-the-art mathematical algorithms or statistical models are used in the field of cytometry computational analysis. These include the FlowSOM modality used in Spectre, carrying out clustering based on a self-organizing map (SOM) method, while Seurat first ran PCAs on the overall dataset, next constructed a K-nearest neighbor (KNN) graph, similar to PhenoGraph, another single-cell analytical tool, and then adopted the Louvain algorithm to group cells together (10,23). Such differences might account for the discrepancies we observed when comparing the clustering outcomes from Seurat and Spectre. Detailed comparisons of these mathematical methodologies are not within the scope of the current work, but previous study has shown that FlowSOM and PhenoGraph were the top-performing unsupervised methods for mass cytometry data clustering analysis. The KNN graph model deployed in PhenoGraph excelled in its clustering precision, stability and robustness in identifying sub-clusters, relative to other approaches like flowMeans, DEPECHE and Xshift (52). Since Seurat shares the similar KNN model to PhenoGraph, it is reasonable to expect it could also show robust capacity in more generic cytometric data analysis. This warrants more systematic comparisons in future work. On the other hand, differences in results from Seurat and Spectre highlight that utilizing both methods in tandem will provide a more complete understanding of complex datasets.
In addition to its distinct mathematical nature, as one of the cutting-edge end-to-end analytical tools for single-cell data, Seurat has already been widely used in various research and clinical settings and is vigorously maintained and supported by its broad user community. This contributes to its easy accessibility and high user-friendliness and might reduce the coding burden as well, as users, especially those with previous experience in scRNA-seq, would not need to learn a completely new package or coding language for analysis. Moreover, the application of Seurat potentially opens more possibilities for cytometric data analysis. For example, the built-in functionFindMarkers() in Seurat might facilitate easier marker identification, particularly for high-dimensional cytometric data, as the number of markers is expanding continuously. However, caution should still be taken to interpret its outcomes. Also, as a popular scRNA-seq analysis package, Seurat could also act as a wrapper with favorable interoperability around a wide range of complementary packages or plugins originally developed for scRNA-seq analysis, such as LIGER and Harmony for data integration (28,53). They might also be applicable to cytometric data, such as for batch correction, and could provide novel possibilities for high-dimensional data analyses once validated. Thus, adapting Seurat offers a single simple platform to analyze, compare and cross-validate protein and RNA, and even potentially other multi-omic single-cell data.
Previously, there were only a few reports applying Seurat for protein level single-cell analysis, such as for cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) (26) and CyTOF (54,55), and leveraging the rPCA integration method in Seurat for spectral cytometry analysis (34,56). Recently, there has also been a similar attempt, adapting Scanpy, a Python-based scRNA-seq analysis package, to analyze mass cytometry data (57). To our knowledge, the present work represents the first example of applying Seurat as a complete flow cytometric analysis workflow. Harnessing our 20-colour antibody panel and the Seurat-based analysis pipeline, we reported a unique T cell subset in CBMCs, characterized as CD8+CD45RA+CD27+CD161+T cells. This subset partly overlapped with the previously described CD8+CD161+ T cells (40). Previously, studies first discovered the differential (low/intermediate/high) levels of CD161 on CD8+ T cells in AB, which correlated with their various functional activities including cytokine production, proliferation, and lytic activity (58). AB CD8+CD161hi T cells were predominantly mucosal- associated invariant T (MAIT) cells (59), while the CD8+CD161int population represented a memory T cell subset which were enriched in the colonic lamina propria (40). Consistent with this, our clustering analysis found a CD8+CD161+ population predominantly existing in AB (Figure 3B), although with the current clustering setting, both Seurat and Spectre failed to further subdivide it into CD161int and CD161hisubsets. This can be overcome by finetuning the clustering parameters depending on the particular scientific question of interest. As for our newly identified CD8+CD45RA+CD27+CD161+subset, considering its naïve phenotype, it is not surprising that it is almost negligible in AB.
The role of the CD8+CD161+ T cells in CB remains elusive. Developmentally, it was found that the CD8+CD161hi T cells in CB might be the progenitor for post-natal MAIT cells (41,59,60). Functionally, the CD161hi subset produced IFN-γ and IL-17 (60,61), while the CD161int subset, despite expressing markers like CD45RA and CCR7, still exhibited a preprogrammed transcriptomic profile reflective of their AB counterpart (40). Our current clustering analysis could not further separate the population based on CD161 levels, but adjusting the clustering parameters could potentially help to differentiate them considering their intermediate to high CD161 expression (Figure 4B). The naïve phenotype of this CB-enriched subset, based on the expression of CD45RA and CD27, is similar to previous reports (40,61), and we also confirmed its IFN-γ production. Previously, there were limited studies investigating the functional surface markers of CD8+CD161+ T cells, such as CCR6 (59). Here, we characterized the CD8+CD45RA+CD27+CD161+population as high in integrin β7 but low in CLA expression, implying a preference to gut over skin-homing. Thus, this CB population might represent the progenitors for AB CD8+CD161int T cells which are enriched within the colon (40). CD8+CD161+ T cells in AB are involved in the response to tissue-localized inflammation triggeredd by intracellular and viral pathogens (40,59), while their functional implications in CB remain elusive. Likewise, both our flow cytometry and scRNA-seq data suggested the pro-inflammatory and cytotoxic properties of CB CD8+CD45RA+CD27+CD161+T cells. Additionally, as CD161 contributes to prenatal immune suppression (62), this subset might be involved in maintaining tolerance in the semi-allogenic context of pregnancy.
In summary, we have adapted Seurat, a widely used scRNA-seq analysis package, for high-dimensional flow cytometric data analysis and showcased its performance through the identification of a unique CD8+CD45RA+CD27+CD161+T cell population in CB. Such a pipeline presents a novel avenue for comprehensive analysis of high-dimensional complex cytometric and multi-modal data, facilitating unbiased data-driven studies and discovery.