Single-cell RNA-seq data preprocessing:
The Cell Ranger software pipeline (version 3.1.0) provided by 10×Genomics was used to demultiplex cellular barcodes, map reads to the genome and transcriptome using the STAR aligner, and down-sample reads as required to generate normalized aggregate data across samples, producing a matrix of gene counts versus cells. We processed the unique molecular identifier (UMI) count matrix using the R package Seurat (version 3.1.1). To remove low quality cells and likely multiplet captures, which is a major concern in microdroplet-based experiments, we apply a criteria to filter out cells with UMI/gene numbers out of the limit of mean value ± 2 fold of standard deviations assuming a Guassian distribution of each cells’ UMI/gene numbers. Following visual inspection of the distribution of cells by the fraction of mitochondrial genes expressed, we further discarded low-quality cells where >10% of the counts belonged to mitochondrial genes. After applying these QC criteria, 14202 single cells in total remained and were included in downstream analyses. Library size normalization was performed in Seurat on the filtered matrix to obtain the normalized count.
Top variable genes across single cells were identified using the method described in Macosko et al. Briefly, the average expression and dispersion were calculated for each gene, genes were subsequently placed into 13 bins based on expression. Principal component analysis (PCA) was performed to reduce the dimensionality on the log transformed gene-barcode matrices of top variable genes. Cells were clustered based on a graph-based clustering approach, and were visualized in 2-dimension using tSNE. Likelihood ratio test that simultaneously test for changes in mean expression and in the percentage of expressed cells was used to identify significantly differentially expressed genes between clusters. Here, we use the R package SingleR, a novel computational method for unbiased cell type recognition of scRNA-seq, with the reference transcriptomic datasets ‘Human Primary Cell Atlas’ (Mabbott et al. 2013) to infer the cell of origin of each of the single cells independently and identify cell types.
Differentially expressed genes (DEGs) were identified using the FindMarkers function of Seurat[1] package. P value < 0.05 and |log2foldchange| > 0.58 was set as the threshold for significantly differential expression.