Abstract
Advances in single-cell level analytical techniques, especially
cytometric approaches, have led to profound innovation in biomedical
research, particularly in the field of clinical immunology. This has
resulted in an expansion of high-dimensional data, posing great
challenges for comprehensive and unbiased analysis. Conventional manual
analysis is thus becoming untenable to handle these challenges.
Furthermore, most newly developed computational methods lack flexibility
and interoperability, hampering their accessibility and usability. Here,
we adapted Seurat, an R package originally developed for single-cell RNA
sequencing (scRNA-seq) analysis, for high-dimensional flow cytometric
data analysis. Based on a 20-marker antibody panel and analyses of T
cell profiles in both adult blood and cord blood, we showcased the
robust capacity of Seurat in flow cytometric data analysis, which was
further validated by Spectre, another high-dimensional cytometric data
analysis package, and conventional manual analysis. Importantly, we
identified a unique CD8+ T cell population defined as
CD8+CD45RA+CD27+CD161+T cell, that was predominantly present in cord blood. We characterized
its IFN-γ-producing and potential cytotoxic properties using flow
cytometry experiments and scRNA-seq analysis from a published dataset.
Collectively, we identified a unique human cord blood
CD8+CD45RA+CD27+CD161+T cell subset and demonstrated that Seurat, a widely used package for
scRNA-seq analysis, possesses great potential to be repurposed for
cytometric data analysis. This facilitates an unbiased and thorough
interpretation of complicated high-dimensional data using a single
analytical pipeline and opens a novel avenue for data-driven
investigation in clinical immunology.
Introduction
The rapid development of analytical technologies at a single-cell level
over recent decades has revolutionized biological and medical research,
particularly in the field of immunology. Immune cell populations are
well-known for their heterogeneity and tools such as flow cytometry (or
fluorescence-activated cell sorting, FACS), cytometry by time-of-flight
mass spectrometry (CyTOF) and single-cell RNA-sequencing (scRNA-seq),
facilitate an in-depth identification and characterization of various
immune cell types (1). Conventionally, analysis of cytometric data
(including flow, spectral and mass cytometry) has relied on manual
analysis based on empirical gating strategies under expert supervision.
This is extremely labor-intensive and tedious, as the complex cytometric
data is limited to permutational visualization of two-dimensional (2D)
plots (FACS plots). These plots feature different pairs of marker
combinations, which requires arduous sequential inspection (2). The
possible combinations of markers from a given panel increase
exponentially with the addition of extra parameters. As more and more
state-of-the-art cytometric panels exceed 20 markers (3,4), thorough
manual gating analysis is becoming increasingly challenging and
impractical (5). Furthermore, such analytical workflows are inevitably
subject to bias, considering their dependence on empirical knowledge and
subjective selection and inspection of markers. These limitations hamper
analyses and potentially conceal novel findings.
Various computational approaches have been developed as potential
solutions, including methods for dimension reduction (such as
t-distributed stochastic neighbor embedding, tSNE (6), and uniform
manifold approximation and projection, UMAP (7)), clustering (such as
PhenoGraph, and FlowSOM (8,9)), and automated cell gating and
classification (5,10). These tools all accelerate high-dimensional data
analysis. Moreover, they have revolutionized cytometry-based research,
transitioning from the conventional hypothesis-driven strategy that
focuses on specific cell types or markers, to more unbiased and
comprehensive methods, that simultaneously take all data into account.
Despite their notable success, these tools still suffer significant
limitations. For example, many of these computational modalities are
separate, and some even require specific data formatting and processing
procedures. This is not user-friendly and hampers their usability and
accessibility in the broader research community. While some integrative
toolkits combine these modalities and offer end-to-end analysis of
cross-platform cytometric data including normalization, integration and
clustering, such as Spectre (10), and ImmPort Galaxy (11), these are
still few in number. There are also some other commercial toolkits of
this kind like OMIQ (12,13) and Cytobank (14), but they usually require
paid subscriptions, limiting their availability. Furthermore, owing to
their non-open-source nature, commercial toolkits can lag in flexible
customizing services, as well as community-driven support, maintenance
and improvements, potentially preventing optimal usability and
adaptability. Hence, there is great interest in more accessible and
adaptable tools.
Computational analyses of complex cytometric data have considerably
benefited clinical immunological research. On one hand, clinical
immunological data is notorious for its heterogeneity, highlighting the
need for computational tools for data cleaning, batch alignment, and
unbiased analysis. On the other hand, given the limited availability of
clinical samples (such as samples with rare disease backgrounds or
longitudinal samples), expanding the markers and dimensionality of
cytometry panels may help to achieve more comprehensive and efficient
investigation, and represents an unprecedented opportunity for
high-dimensional data analysis. Leveraging the rapidly developing
computational analytical tools for clinical immunological studies is
thus emerging as a promising avenue to provide more detailed insights
into clinical contexts whilst maximizing the values of limited clinical
samples.
An area of increasing interest is the characterization of immune
profiles in umbilical cord blood (CB) compared to adult blood (AB). The
striking immunological differences between CB and AB not only offer
critical insights for disease pathogenesis but also provide an ideal
scenario for showcasing the analytical power of computational toolkits
in clinical applications.
The main components of the CB immune compartment are cord blood
mononuclear cells (CBMCs), known to exhibit unique characteristics
relative to peripheral blood mononuclear cells (PBMCs) from AB, due to
the semi-allogeneic environment of pregnancy. Mirroring the fetal immune
system (15), CBMCs feature a more naïve phenotype (16–18), and are
implicated in the physiology and pathology during both pregnancy and
later in life (19–22). Hence, understanding CB immune profiles and
their differences from AB provides precious insights into immunological
development at different stages, as well as sheds light on the complex
immunobiology of pregnancy.
Here, we developed a 20-marker antibody panel for thoroughly
immunophenotyping T cells in both CB and AB. We adapted Seurat, a widely
used end-to-end package originally for scRNA-seq analysis, for the
resulting high-dimensional flow cytometric data analysis after primary
processing in FlowJo. This workflow identified several previously
underappreciated T cell subsets in AB, validated by Spectre, another
computational cytometric analytical package, and conventional manual
gating, showcasing the capacity of Seurat. Importantly, using Seurat for
a comparative study of CB and AB profiles, we revealed a unique CB T
cell population, characterized as
CD8+CD45RA+CD27+CD161+T cells. Analysis of previously published scRNA-seq data confirmed this
identified population and hinted at its possible cytotoxic and
pro-inflammatory properties. Together, this represents the first
application example of using Seurat as a complete flow cytometric
analysis workflow and demonstrates its robust analytical performance. It
emerges as a simple and easy-to-use toolkit for cytometric data
analysis, particularly for its pre-existing wide scRNA-seq user
community. Seurat also features as a single platform but with various
supplementary tools and plugins facilitating single-cell analysis of
both protein and RNA data, as well as their comparisons and
cross-validation. This represents a novel unbiased discovery tool for
complex single-cell data analysis in clinical immunology.
Materials and Methods
Sample preparation and flow cytometry experiments
PBMCs from adult blood samples and CBMCs from cord blood samples were
prepared as described in Supplementary Information 1.1. For flow
cytometry experiments, cells were processed, barcoded, and stained as
Supplementary 1.2 – 1.3. and analyzed using the 5-laser Aurora Spectral
cytometer (Cytek Biosciences, USA) on the same day.
Cytometric data was unmixed with SpectroFlo and analyzed using FlowJo
(BD Life Science) based on the gating strategy in Supplementary Figure
1A. The CD3+CD4+ and
CD3+CD8+ T cell populations were
manually gated and exported as CSV files with their scaled values for
further computational analysis in RStudio using Seurat and Spectre.
High-dimensional flow cytometry data analysis
Data preparation
For analysis using Seurat 4.3.0 and Spectre 1.0.0, the exported CSV
files of the gated populations of interest were loaded in RStudio
(v4.1.2). Next, data tables for each sample were merged together using
Spectre’s do.merge.files() function. The data then underwent an
hyperbolic arcsine (arcsinh) transformation (co-factor = 2000) using
Spectre (do.asinh ). To avoid bias and skewing due to different
cellularity, the processed cytometric data was first randomly
downsampled to the same number among samples before analysis.
Analysis using Seurat
After reading in the downsampled data matrix generated in 2.2.1., along
the conventional Seurat analysis workflow (23–26), procedures such as
QC and normalization originally for transcriptomic data were skipped.
Furthermore, to preserve the transformed flow cytometry data structure
for analysis, the data scaling process was bypassed by selecting
“do.scale = FALSE, do.center = FALSE” in the ScaleData()function.. All features were selected as variable features for further
analysis.
After that, principal component analysis (PCA) was performed. Based on
the PCA scores, the top PCs contributing to 99% of variance were
selected for the subsequent cluster analysis using theFindNeighbors() and FindClusters() functions and dimensional
reduction with the RunUMAP() function. We utilized theDotPlot() and VlnPlot() functions in Seurat to visualize
the expression of various markers in different clusters.
A detailed workflow for analyzing flow cytometry data using Seurat is
available in the Supplementary Information 1.4.2.
Analysis using Spectre
For analysis with Spectre, input data were first down sampled and then
analyzed following (10), except a 5×5 self-organizing map (SOM) was
used. The final clustering numbers were also adjusted to the same as
Seurat’s results for comparison.
Single-cell RNA sequencing (scRNA-seq) analysis
Preprocessed single-cell RNA sequencing (scRNA-seq) data was downloaded
from Gene Expression Omnibus from a previous study (27) (GEO accession:
GSE158493). Since the original dataset was grouped based on sample
origins (fetal spleen, full-term umbilical cord blood and adult
peripheral blood), the LIGER package was used to re-integrate the data
with different origins to eliminate potential batch effects (28). After
that, the data were analyzed with Seurat 4.3.0. (23–26). Marker genes
were identified using Seurat’s FindMarkers() function based on
its default settings, with thresholds for differentially expressed genes
defined as p < 0.05 and LOG2(Fold change) higher than 0.5 or
lower than -0.5. Gene ontology (GO) analysis and gene set enrichment
analysis (GSEA) was run with package fgsea and ClusterProfiler following
their tutorials (29,30).
Statistics
Statistical analysis was performed with GraphPad PRISM 10 or in RStudio
using Seurat. Mann-Whitney U test was used for comparison between two
groups. Differences were considered to be statistically significant when
p < 0.05.
Results
Seurat is a reliable tool for high-dimensional flow
cytometry data analysis
Analysis of human peripheral blood, and the heterogenous T cell
compartments therein, is of great importance considering the
availability of blood samples and its ubiquitous use in medical research
and clinical diagnosis. Here, we established a 20-marker antibody panel
(Table 1) to comprehensively characterize the functional profiles of
CD4+ and CD8+ T cells in human
peripheral blood mononuclear cells (PBMCs). Details about the markers
and their functions are described in Table 1. They cover various aspects
of T cell biology, including maturation, activation, migration, and
function.
PBMCs from healthy adult individuals were collected and processed based
on the workflow in Figure 1A. To reduce antibody wastage as well as
minimize the inter-sample variations caused by batch effects introduced
during experimental processes, which is an intrinsic problem for
clinical studies involving flow cytometry, we utilized a barcoding
system leveraging the anti-CD45 labelling. In the current study, PBMC
samples were stained with our 20-colour antibody panel and analyzed by
spectral cytometry. After demultiplexing, the resulting cytometric data
were manually gated for CD4+ and
CD8+ T cell populations (Supplementary Figure 1A).
For proof-of-concept, the CD8+ T cell data was
analyzed with Seurat and compared with Spectre, based on the 12
functional markers out of the total 15 surface markers, excluding
lineage markers such as CD45, CD3 and CD8. The results were validated
through manual gating as well.
Seurat clustered the adult CD8+ T cell compartment
into 14 clusters (Se1-14) and projected them onto the 2D UMAP plot as
shown in Figure 1B. Historically, T cells are gated into four
populations based on expression of CD45RA and CD27. The
CD45RA+CD27+ population is defined
as naïve, the CD45RA-CD27+population as central memory (CM), the
CD45RA-CD27- population as effector
memory (EM) and the CD45RA+CD27- is
effector memory that re-express CD45RA (EMRA) (31). Comparing Seurat’s
results to this classification, we found that Se1, Se2 and Se8 were
naïve, Se4, Se5 and Se7 were CM, Se11 and Se13 were EM, and Se12 was
EMRA. Interestingly, Seurat clustering retrieved populations exhibiting
intermediate expression levels of CD45RA and CD27, such as clusters Se3,
Se6, Se9, Se10 and Se14 (Figure 1C-D), which did not necessarily fall
into any of the four conventional gates mentioned above. This implies
that the conventional gating strategy based only on positive or negative
expression of markers is incomplete. Seurat, on the other hand, can
provide a more comprehensive analysis for multiple markers, and thus
enable the discovery of previously unidentified subsets.
Feature markers identified by Seurat and the potential functional
properties of each cluster are summarized in Supplementary Table 3.
Clusters with naïve phenotypes (Se1, Se2 and Se8) expressed different
levels of integrin β7, indicating differential gut-homing potentials
(32). Two CM clusters, Se4 and Se7, were also high in integrin β7,
however, they could be separated based on CXCR-5 and CD161 expression,
as markers for follicular T cells and cytotoxic T cells, respectively.
The CM cluster Se5 uniquely featured the skin-homing marker, CLA.
Cluster Se6 and Se9 both expressed intermediate levels of CD161, but Se6
also had intermediate expression of integrin β7 while Se9 did not
express integrin β7 at all. Se3 exhibited a similar profile to Se6,
except for the absence of CD161. Additionally, their intermediate
expression of CD45RA implied that they might represent the transitional
state between EM and EMRA (33). For EM-like and EMRA-like clusters, Se13
and Se12, but not Se11, they expressed high levels of integrin β7.
Finally, Se14 was characterized by its high level of LAG-3, a marker for
early T cell activation, while Se10 might be a transitional population
during the T cell activation and maturation, considering its
intermediate expression of both CD45RA and CD27. Other markers in our
panel including CD49b and FCER1A were expressed at relatively low levels
in T cells in the absence of stimulation. These subtle differences could
still be detected by Seurat and confirmed with manual analysis (Figure
1C and Supplementary Figure 2A-B). For example, CD49b, a
collagen-binding integrin, showed the highest level in the skin-homing
population Se5, while FCER1A expression inversely correlated with CD27
expression, as exemplified in Se3, Se6, Se9, Se11 and Se12. Considering
their relatively low level of expression, further research is required
to evaluate whether such mild differences are of biological
significance.
In parallel, we applied Spectre’s workflow to analyze our dataset,
manually defining 14 clusters as the final clustering output to
cross-validate the findings from Seurat. Comparison of the Seurat and
Spectre results revealed that their outcomes were similar, with 11 of
the 14 clusters being identified by both approaches, displaying
comparable cluster sizes and feature marker expression (Figure 1C,
Figure 1E and Supplementary Figure 2C). This indicated the robust and
reliable performance of Seurat. As for their discrepancies, the Sp2
identified by Spectre was divided into three distinct clusters by
Seurat, Se1, Se2 and Se8 (Supplementary Figure 2D). In contrast, Se4 in
Seurat analysis was split into Sp1 and Sp4 by Spectre (Supplementary
Figure 2D). In an independent comparative study, we also found that most
cells were grouped into the same cluster by both methods, with minor
discrepancies (Supplementary Figure 3). Together, these results
indicated that clustering analyses from Seurat and Spectre were
comparable and since they adopted different algorithms for their
analyses, some differences were potentially to be expected.
Importantly, we further validated the results from Seurat with manual
gating. As shown in Supplementary Figure 4, all 14 Seurat clusters could
be gated out according to their marker expression on 2D FACS plots.
Projection of Seurat and Spectre clustering results on the 2D FACS plots
showed similar distributions compared to the manual gating results
(Supplementary Figure 5-7). Furthermore, the proportion of each
population in the total CD8+ T cells were also
comparable between manual gating and Seurat clustering (Figure 1F).
Finally, we similarly applied the workflow to the gated
CD4+ T cells and retrieved 15 clusters with Seurat
(Supplementary Figure 8A-B). These subsets could also be similarly
confirmed by manual gating and cross-validated with Spectre, which
identified 12 out of the 15 subsets obtained by Seurat (Supplementary
Figure 8C-D).
Together, these results demonstrated that Seurat, a tool originally
developed for scRNA-seq data analysis, is also applicable and robust for
high-dimensional flow cytometric data analysis. Its analysis helps to
characterize the CD8+ T cells in PBMCs in more detail,
retrieving novel T cell sub-clusters.
Comparative profiling of CD8+ T cells in
adult blood and cord blood
Human circulatory T cells are plastic across the human lifespan and
might be linked to differential disease susceptibility across human
lives (34). We next aimed to compare the profiles of the T cell
compartments from the cord blood (CB) and adult blood (AB) using Seurat.
As shown in Figure 2, peripheral blood mononuclear cells (PBMCs) from AB
and cord blood mononuclear cells (CBMCs) from CB were first analyzed
based on our 20-parameter antibody panel with manual gating. Next, the
high-dimensional data were analyzed by Seurat and Spectre (Figure 3-4).
The proportion of CD4+ and CD8+ T
cells among total lymphocytes showed no difference in PBMCs and CBMCs
(Supplementary Figure 1B). As previously shown (16) , there were more
naïve T cells and fewer EM T cells in both CD4+ and
CD8+ compartments from CBMCs compared with PBMCs
(Figure 2B-D). Reflective of a mature phenotype, adult PBMCs had higher
proportions of CD4+ CM T cells and
CD8+ EMRA T cells (Figure 2C-D). Our comprehensive
20-parameter antibody panel enabled an in-depth characterization of the
functional status of T cells.
As shown in Figure 2E, CD4+ T cells in adult PBMCs had
higher proportions of cells expressing T-bet, CLA and CXCR-5, while more
CD4+ T cells from CB expressed CD27, β7, and LAG-3.
Similarly, CB CD8+ T cells had a higher expression of
CD27, β7, alongside FoxP3, whilst adult CD8+ T cells
had higher proportions of cells expressing CD161, CLA, CXCR-5 and T-bet
(Figure 2F and Supplementary Figure 9A-B). These findings are consistent
with previous reports of the higher gut-homing potential of CBMCs,
whilst PBMCs are more likely to migrate to the skin (16,34–36).
Moreover, the lower expression of T-bet and overall naïve-biased
phenotype of CB T cells coincides with their reduced IFN-γ, IL-4 and
IL-13 production compared to AB T cells (Supplementary Figure 9C-D)
(37).
Analysis of adult blood and cord blood CD8+T cells with Seurat identifies a unique cord blood
CD8+CD45RA+CD27+CD161+T cell subset
Conventional manual gating workflows are empirical and subject to bias.
Such analysis is less likely to potentially reveal new cell populations.
Focusing on the CD8+ T cell compartment, we therefore
performed unsupervised principal component analysis (PCA) on the AB and
CB combined dataset based on expression levels quantified by the
expression levels of the 12 surface markers used for clustering (Figure
3A). As expected, PBMC samples were distinctly separated from CBMC
samples along the first PC (PC1, accounting for 18.0% of the variance),
which was consistent with the differential expression of various T cell
functional markers in AB samples relative to CB samples (Figure 2E-F).
Next, Seurat was used for more in-depth analyses, identifying 13
sub-populations from the combined AB and CB dataset (Figure 3B). Seurat
clustering distinguished three naïve (clusters 1, 2 and 8), three CM
(clusters 3, 4 and 7), one EM (cluster 11), and two EMRA clusters
(clusters 5 and 12), with the remaining clusters exhibiting intermediate
expression levels of CD45RA and/or CD27 (clusters 6, 9, 10, 13) (Figure
3C-D). Details about feature markers and potential functional properties
of the identified CD8+ T cell subsets are summarized
in Supplementary Table 4. Clusters 1 and 2 shared a naïve phenotype but
differed in their expression of integrin β7 and gut-homing potential,
consistent with previous findings demonstrating an increased expression
of gut-homing receptors in naïve CB lymphocytes (38). The naïve cluster
8 was also high in integrin β7 but additionally expressed CD161,
suggesting a cytotoxic phenotype. Out of the three CM subsets identified
by Seurat, Clusters 3 and 7 of the CM class both had high integrin β7
expression, indicative of a migratory preference to the gut. Cluster 7
differed from Cluster 3 through its low expression of CD45RA and high
expression of the follicular T cell marker, CXCR-5. Opposed to
gut-homing, the CM cluster 4 did not express integrin β7 but instead
highly expressed the skin-homing marker CLA and integrin CD49b. Integrin
β7 levels could also distinguish the EMRA-like clusters 5 versus 12 as
well as clusters 6 versus 9, with clusters 6 and 9 also expressing
CD161. Finally, cluster 13 featured high levels of LAG-3 and CXCR-5.
We again compared the analyses from Seurat and Spectre based on the
combined AB and CB dataset. Both methods identified 13 clusters, of
which 11 were commonly shared, with comparable feature markers
(Supplementary Figure 9E-F). Similarly, Seurat clusters in both AB and
CB could also be validated through manual gating (Supplementary Figure
10 and Supplementary Figure 11). These results again highlighted the
robust performance of Seurat for analyzing high-dimensional cytometric
data.
Striking differences in the abundance of the Seurat clusters were found
comparing AB and CB samples (Figure 3B and Supplementary Figure 12A).
The naïve cluster 1 was predominantly abundant in AB whilst cluster 2
consisted primarily of CB cells. Seurat cluster analysis was able to
separate these two naïve populations based on the differential
expression of integrin β7 and CD27, which were enriched in the naïve
population from CB (cluster 2) (16,38). Consistent with previous studies
comparing adult and cord blood, CD8+ effector memory
populations were almost exclusively found in AB (39). This includes all
the identified EM (cluster 11) and EMRA (clusters 5 and 12) subsets, in
addition to clusters 6, 9 and 10. Unlike these EM subsets, CM
populations were present in both AB and CB. Of note, cluster 3 was
dominant in CB whilst clusters 4 and 7 were enriched in AB. Finally, the
LAG-3+ cluster 13 was equally abundant in both AB and
CB.
Moreover, we also documented differential expression of various markers
between the clusters from AB and CB. Reflective of their naïve phenotype
and lack of antigenic exposure, several CB CD8+ T cell
populations (clusters 1, 2, 3, 4, 7, 8, 9) exhibit higher expression of
CD27 compared to their adult equivalents (Figure 3E). In contrast, the
expression of CXCR-5 in clusters 7 and 13 alongside CLA expression in
cluster 4 was higher in adult CD8+ T cells compared to
CB (Figure 3E).
Intriguingly, our clustering analysis identified a population (cluster
8) that is almost exclusive to CBMCs (7.11% of CD8+ T
cells in CB versus 0.19% in AB) (Figure 4A-C). This cluster is
characterized as
CD8+CD45RA+CD27+CD161+,
partly overlapping with a previously reported but not fully
characterized, CD8+CD161+ T cell
population found in CB (40,41). Analysis based on our 20-parameter panel
provided an unprecedented functional overview of the
CD8+CD45RA+CD27+CD161+T cell subset. We discovered that this newly identified population had
high integrin β7 expression (Figure 4B and Figure 4D) as well as higher
CLA, BCL-6, T-bet, and GATA-3, but lower FoxP3 levels compared to its
CD161- counterpart (Figure 4E). No differences in
LAG-3, CXCR-5, Nur77, CD137 and FCER1A were found between the
CD8+CD45RA+CD27+CD161+and
CD8+CD45RA+CD27+CD161-T cells (Figure 4E). Upon stimulation with PMA and ionomycin, this
subset predominantly produced IFN-γ and IL-4 but lowly expressed IL-5,
IL-10, and IL-13. It only differed significantly in IL-10 production
relative to the CD161- counterpart (Figure 4F). Based
on these results, the
CD8+CD45RA+CD27+CD161+sub-population (cluster 8) appeared to be a pro-inflammatory and
cytotoxic T cell subset.
In summary, using our high-dimensional antibody panel and Seurat
analysis, we thoroughly profiled the CD8+ T cell
compartment in AB versus CB. This revealed a unique
CD8+CD45RA+CD27+CD161+T cell subset in CB which we characterized.
Cross-validation and further characterization of the
CD8+CD45RA+CD27+CD161+T cell subset using scRNA-seq.
To validate our findings and further characterize the population
identified by Seurat in CBMCs, we leveraged a recently published
scRNA-seq dataset for naïve CD8+ T cells, which
analyzed 18513 cells across different developmental stages and
compartments (27). This dataset covers naïve T cells from fetal spleen,
umbilical cord blood and adult peripheral blood. As shown in Figure 5A
and Supplementary Figure 13A, the overall naïve CD8+ T
cell population (sorted as
CD8+CD45RA+CD27+CCR7+CD95-)
was further clustered into four subsets. Cluster0 expressed high levels
of RGS1 , which is linked to T cell exhaustion (42); cluster1 was
high in IL7R and SELL , indicative of a naïve phenotype;
and cluster2 was marked by MT2A and RPS4Y1, which might be potentially
linked to a memory T cell phenotype (43,44). Intriguingly, cluster3
exhibited similar features to the population we identified,
characterized by its expression of KLRB1 (gene encoding CD161)
(Figure 5B). Differentially expressed gene analysis with DESeq2 (45)
found 93 genes upregulated in cluster3 compared with the remaining naïve
CD8+ T cells, while 2 genes were downregulated (Figure
5C). Among the upregulated genes in cluter3, there were several related
to cytotoxic T cell features, such as GZMA , GZMK ,GZMM , CST7 , and NKG7 (46), indicating their
cytotoxic functions, which is similar to our cytometric results and the
previous reports for CD8+ CD161+ T
cells (40). Additional inflammation-related genes were upregulated, such
as ID1 (47) and CMC1 (48). Interestingly, CCL5 ,
related to a memory phenotype, was also significantly upregulated in
cluster3. Additionally, several chemokine receptors were also increased,
such as CXCR3 and CXCR4 .
GO analysis was next carried out with the differentially expressed genes
using ClusterProfiler (29,30) based on the Gene Ontology pathways (49).
As shown in Figure 5D, compared with other naïve CD8+T cells, cluster3 was enriched for pathways related to T cell
chemotaxis , lymphocyte chemotaxis and T cell migration ,
consistent with the differentially expressed gene analysis (Figure 5C)
and our flow cytometric data (Figure 3I and 3J). On the other hand, this
population was suppressed in pathways related to ribosomes.
The GO analyses were based only on the differentially expressed genes,
and we next conducted GSEA based on the overall transcriptomic data
using the fgsea package (50). GSEA using the Hallmark gene setshowed that cluster3 displayed enrichment in inflammatory pathways such
as TNFα signaling via NFκB, IL2-STAT5 signaling, and IFNγ
response (Figure 5E). Additionally, they also showed enrichment in theMAPK signaling pathway and the T cell receptor signaling
pathway (Figure 5F-G).
In summary, scRNA-seq analysis from an independent dataset
cross-validated the
CD8+CD45RA+CD27+CD161+population identified by our Seurat-based analysis. It also
characterized it as a naïve subset but with a potential pro-inflammatory
and cytotoxic profile.
Discussion
There has been a significant expansion of both the size and complexity
of cytometric data, especially in the field of clinical immunology. Such
high-dimensional and complicated datasets cause great difficulties for
conventional manual analytical strategies, inevitably hampering
comprehensive and unbiased analyses and interpretation. Consequently,
myriad computational toolkits have been developed, aiming to address
these challenges, but their effective applications are sometimes
restricted, suffering from a lack of flexibility and interoperability.
Recently, packages such as Spectre (10) and tidyof (51) were developed,
attempting to provide integrative, end-to-end services for cytometric
data analysis. However, they have only been sporadically applied, due to
them being standalone pipelines, requiring adaptation to completely new
packages, and demanding significant prerequisite coding knowledge.
In the present study, we repurposed Seurat, a well-established package
for scRNA-seq data analysis, for high-dimensional flow cytometric data
analysis (23–26). Comparison of Seurat and other currently available
analytical packages are summarized in Supplementary Table 5. Among them,
Seurat has long shared great popularity within the field of single-cell
analysis. It is community-driven and well-supported and has more than 20
R packages for related data processing and analysis. Therefore, it is
likely to be more accessible and easier to use, particularly for the
broad users with previous experiences in scRNA-seq looking to flow
cytometry to complement their investigative breadth.
Here, we showcased the robust capacity of Seurat, based on our
experiments profiling the T cell compartments in adult blood (AB) and
cord blood (CB). Overall, Seurat generated similar results to Spectre,
which were also confirmed by manual gating. Importantly, with our
approach, we identified a unique T cell subset
(CD8+CD45RA+CD27+CD161+)
within CB and cross-validated its functional profiles with an
independent scRNA-seq dataset using Seurat. Together, these data
highlight the great potential of Seurat for cytometric data analysis. It
represents a simple single platform for the unbiased analysis of both
protein and RNA data at single-cell resolution. This will enable simpler
comparison and cross-validation of cytometric and scRNA-seq studies and
facilitate more comprehensive investigations and discoveries in clinical
immunology.
A plethora of state-of-the-art mathematical algorithms or statistical
models are used in the field of cytometry computational analysis. These
include the FlowSOM modality used in Spectre, carrying out clustering
based on a self-organizing map (SOM) method, while Seurat first ran PCAs
on the overall dataset, next constructed a K-nearest neighbor (KNN)
graph, similar to PhenoGraph, another single-cell analytical tool, and
then adopted the Louvain algorithm to group cells together (10,23). Such
differences might account for the discrepancies we observed when
comparing the clustering outcomes from Seurat and Spectre. Detailed
comparisons of these mathematical methodologies are not within the scope
of the current work, but previous study has shown that FlowSOM and
PhenoGraph were the top-performing unsupervised methods for mass
cytometry data clustering analysis. The KNN graph model deployed in
PhenoGraph excelled in its clustering precision, stability and
robustness in identifying sub-clusters, relative to other approaches
like flowMeans, DEPECHE and Xshift (52). Since Seurat shares the similar
KNN model to PhenoGraph, it is reasonable to expect it could also show
robust capacity in more generic cytometric data analysis. This warrants
more systematic comparisons in future work. On the other hand,
differences in results from Seurat and Spectre highlight that utilizing
both methods in tandem will provide a more complete understanding of
complex datasets.
In addition to its distinct mathematical nature, as one of the
cutting-edge end-to-end analytical tools for single-cell data, Seurat
has already been widely used in various research and clinical settings
and is vigorously maintained and supported by its broad user community.
This contributes to its easy accessibility and high user-friendliness
and might reduce the coding burden as well, as users, especially those
with previous experience in scRNA-seq, would not need to learn a
completely new package or coding language for analysis. Moreover, the
application of Seurat potentially opens more possibilities for
cytometric data analysis. For example, the built-in functionFindMarkers() in Seurat might facilitate easier marker
identification, particularly for high-dimensional cytometric data, as
the number of markers is expanding continuously. However, caution should
still be taken to interpret its outcomes. Also, as a popular scRNA-seq
analysis package, Seurat could also act as a wrapper with favorable
interoperability around a wide range of complementary packages or
plugins originally developed for scRNA-seq analysis, such as LIGER and
Harmony for data integration (28,53). They might also be applicable to
cytometric data, such as for batch correction, and could provide novel
possibilities for high-dimensional data analyses once validated. Thus,
adapting Seurat offers a single simple platform to analyze, compare and
cross-validate protein and RNA, and even potentially other multi-omic
single-cell data.
Previously, there were only a few reports applying Seurat for protein
level single-cell analysis, such as for cellular indexing of
transcriptomes and epitopes by sequencing (CITE-seq) (26) and CyTOF
(54,55), and leveraging the rPCA integration method in Seurat for
spectral cytometry analysis (34,56). Recently, there has also been a
similar attempt, adapting Scanpy, a Python-based scRNA-seq analysis
package, to analyze mass cytometry data (57). To our knowledge, the
present work represents the first example of applying Seurat as a
complete flow cytometric analysis workflow. Harnessing our 20-colour
antibody panel and the Seurat-based analysis pipeline, we reported a
unique T cell subset in CBMCs, characterized as
CD8+CD45RA+CD27+CD161+T cells. This subset partly overlapped with the previously described
CD8+CD161+ T cells (40). Previously,
studies first discovered the differential (low/intermediate/high) levels
of CD161 on CD8+ T cells in AB, which correlated with
their various functional activities including cytokine production,
proliferation, and lytic activity (58). AB
CD8+CD161hi T cells were
predominantly mucosal- associated invariant T (MAIT) cells (59), while
the CD8+CD161int population
represented a memory T cell subset which were enriched in the colonic
lamina propria (40). Consistent with this, our clustering analysis found
a CD8+CD161+ population
predominantly existing in AB (Figure 3B), although with the current
clustering setting, both Seurat and Spectre failed to further subdivide
it into CD161int and CD161hisubsets. This can be overcome by finetuning the clustering parameters
depending on the particular scientific question of interest. As for our
newly identified
CD8+CD45RA+CD27+CD161+subset, considering its naïve phenotype, it is not surprising that it is
almost negligible in AB.
The role of the CD8+CD161+ T cells
in CB remains elusive. Developmentally, it was found that the
CD8+CD161hi T cells in CB might be
the progenitor for post-natal MAIT cells (41,59,60). Functionally, the
CD161hi subset produced IFN-γ and IL-17 (60,61), while
the CD161int subset, despite expressing markers like
CD45RA and CCR7, still exhibited a preprogrammed transcriptomic profile
reflective of their AB counterpart (40). Our current clustering analysis
could not further separate the population based on CD161 levels, but
adjusting the clustering parameters could potentially help to
differentiate them considering their intermediate to high CD161
expression (Figure 4B). The naïve phenotype of this CB-enriched subset,
based on the expression of CD45RA and CD27, is similar to previous
reports (40,61), and we also confirmed its IFN-γ production. Previously,
there were limited studies investigating the functional surface markers
of CD8+CD161+ T cells, such as CCR6
(59). Here, we characterized the
CD8+CD45RA+CD27+CD161+population as high in integrin β7 but low in CLA expression, implying a
preference to gut over skin-homing. Thus, this CB population might
represent the progenitors for AB
CD8+CD161int T cells which are
enriched within the colon (40).
CD8+CD161+ T cells in AB are
involved in the response to tissue-localized inflammation triggeredd by
intracellular and viral pathogens (40,59), while their functional
implications in CB remain elusive. Likewise, both our flow cytometry and
scRNA-seq data suggested the pro-inflammatory and cytotoxic properties
of CB
CD8+CD45RA+CD27+CD161+T cells. Additionally, as CD161 contributes to prenatal immune
suppression (62), this subset might be involved in maintaining tolerance
in the semi-allogenic context of pregnancy.
In summary, we have adapted Seurat, a widely used scRNA-seq analysis
package, for high-dimensional flow cytometric data analysis and
showcased its performance through the identification of a unique
CD8+CD45RA+CD27+CD161+T cell population in CB. Such a pipeline presents a novel avenue for
comprehensive analysis of high-dimensional complex cytometric and
multi-modal data, facilitating unbiased data-driven studies and
discovery.