Benchmarking SingleR

In the case studies below we use the Seurat package to process the scRNA-seq data and perform the t-SNE analysis. All visualizations are readily available through the SingleR web tool – http://comphealth.ucsf.edu/SingleR. The web app allows viewing the data and interactive analysis.

Case study 1: GSE74923 – Kimmerling et al. Nature Communications (2016)

Obtaining data

A data set that was created to test the C1 platform. 194 single-cell mouse cell lines were analyzed using C1: 89 L1210 cells, mouse lymphocytic leukemia cells, and 105 mouse CD8+ T-cells. 5 cells with less than 500 non-zero genes were omitted.

The data was downloaded from GEO, and read to R using the following code:

counts.file = 'GSE74923_L1210_CD8_processed_data.txt'
# This file was probably proccessed with Excel as there are duplicate gene names
                                      # (1-Mar, 2-Mar, etc.). They were removed manually.
annot.file = 'GSE74923_L1210_CD8_processed_data.txt_types.txt' # a table with two columns 
                                    # cell name and the original identity (CD8 or L1210)
singler = CreateSinglerSeuratObject(counts.file, annot.file, 'GSE74923', 
                                    variable.genes='de', regress.out='nUMI', 
                                    technology='C1', species='Mouse', 
                                    citation='Kimmerling et al.', reduce.file.size = F, 
                                    normalize.gene.length = T)
save(singler,file='GSE74923.RData'))

SingleR analysis

First, we look at the t-SNE plot colored by the original identities:

# singler$singler[[1]] is the annotations obtained by using ImmGen dataset as reference. 
# singler$singler[[2]] is based on the Mouse-RNAseq datasets.
load (file.path(path,'GSE74923.RData'))
out = SingleR.PlotTsne(singler$singler[[1]]$SingleR.single,
      singler$meta.data$xy, do.label = FALSE, do.letters = F,
      labels=singler$meta.data$orig.ident,label.size = 6, 
      dot.size = 3)
out$p

We can then observe the classification by a heatmap of the aggregated scores using SingleR with reference to Immgen. These scores are before fine-tuning. We can view this heatmap by the main cell types:

SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single.main, top.n = Inf,
                    clusters = singler$meta.data$orig.ident)

Or by all cell types (presenting the top 30 cell types):

SingleR.DrawHeatmap(singler$singler[[1]]$SingleR.single, top.n = 30,
                    clusters = singler$meta.data$orig.ident)