Single-cell RNA-seq data preprocessing:
The Cell Ranger software pipeline (version 3.1.0) provided by
10×Genomics was used to demultiplex cellular barcodes, map reads to the
genome and transcriptome using the STAR aligner, and down-sample reads
as required to generate normalized aggregate data across samples,
producing a matrix of gene counts versus cells. We processed the unique
molecular identifier (UMI) count matrix using the R package Seurat
(version 3.1.1). To remove low quality cells and likely multiplet
captures, which is a major concern in microdroplet-based experiments, we
apply a criteria to filter out cells with UMI/gene numbers out of the
limit of mean value ± 2 fold of standard deviations assuming a Guassian
distribution of each cells’ UMI/gene numbers. Following visual
inspection of the distribution of cells by the fraction of mitochondrial
genes expressed, we further discarded low-quality cells where
>10% of the counts belonged to mitochondrial genes. After
applying these QC criteria, 14202 single cells in total remained and
were included in downstream analyses. Library size normalization was
performed in Seurat on the filtered matrix to obtain the normalized
count.
Top variable genes across single cells were identified using the method
described in Macosko et al. Briefly, the average expression and
dispersion were calculated for each gene, genes were subsequently placed
into 13 bins based on expression. Principal component analysis (PCA) was
performed to reduce the dimensionality on the log transformed
gene-barcode matrices of top variable genes. Cells were clustered based
on a graph-based clustering approach, and were visualized in 2-dimension
using tSNE. Likelihood ratio test that simultaneously test for changes
in mean expression and in the percentage of expressed cells was used to
identify significantly differentially expressed genes between clusters.
Here, we use the R package SingleR, a novel computational method for
unbiased cell type recognition of scRNA-seq, with the reference
transcriptomic datasets ‘Human Primary Cell Atlas’ (Mabbott et al. 2013)
to infer the cell of origin of each of the single cells independently
and identify cell types.
Differentially expressed genes (DEGs) were identified using the
FindMarkers function of Seurat[1] package. P value < 0.05
and |log2foldchange| > 0.58 was set as
the threshold for significantly differential expression.