Introduction

SingleR is a package that performs reference-based annotation of single-cell RNA-seq data. Here we show different ways to create SingleR objects. These objects can then be used with visualization functions available in the SingleR package and the SingleR web app (http://comphealth.ucsf.edu/SingleR/).

Basic SingleR function

SingleR provides built-in wrapper functions to run a complete pipeline with one function. SingleR provides support to Seurat (http://satijalab.org/seurat/), but any other scRNA-seq package can be used. These functions are explained in Case 1 and 2. These functions assist in reading the single-cell data, calculating labels using different references and creating an object that can be used by SungleR plotting functions. However, to run SingleR and retrieve labels for each cell the following function can be used:

singler = SingleR(method = "single", sc_data, ref_data, types, clusters = NULL,
  genes = "de", quantile.use = 0.8, p.threshold = 0.05,
  fine.tune = TRUE, fine.tune.thres = 0.05, sd.thres = 1,
  do.pvals = T, numCores = SingleR.numCores)
warning('Do not use the scaled.data field in Seurat as input. This field represents relative expression across cells, and is not appropriate as input for SingleR. The raw.data and data field are ok, but only if from a non full-length method.') 

This is the basic SingleR function. To use wrapper functions see the cases below.

Case 1: Counts data, no previous analysis

In this case we have counts data, and don’t have any prefered previous analysis of the data.

To create the SingleR object simply run the following function:

singler = CreateSinglerSeuratObject(counts, annot = NULL, project.name,
  min.genes = 200, technology = "10X", species = "Human" (or "Mouse"), citation = "",
  ref.list = list(), normalize.gene.length = F, variable.genes = "de",
  fine.tune = T, reduce.file.size = T, do.signatures = T, min.cells = 2,
  npca = 10, regress.out = "nUMI", do.main.types = T,
  reduce.seurat.object = T, numCores = SingleR.numCores)
save(singler,file='singler_object.RData')

The returned singler object is a list that can be used for further analyses. See below.

Case 2: Already have a single-cell object

In this case we already have a single-cell object with tSNE coordinates and clusters. We want to annotate this object and use those parameters.

To create the SingleR object simply run the following function:

singler = CreateSinglerObject(counts, annot = NULL, project.name, min.genes = 0,
  technology = "10X", species = "Human", citation = "",
  ref.list = list(), normalize.gene.length = F, variable.genes = "de",
  fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T, 
  reduce.file.size = T, numCores = SingleR.numCores)

singler$seurat = seurat.object # (optional)
singler$meta.data$orig.ident = seurat.object@meta.data$orig.ident # the original identities, if not supplied in 'annot'

## if using Seurat v3.0 and over use:
singler$meta.data$xy = seurat.object@reductions$tsne@cell.embeddings # the tSNE coordinates
singler$meta.data$clusters = seurat.object@active.ident # the Seurat clusters (if 'clusters' not provided)

## if using a previous Seurat version use:
singler$meta.data$xy = seurat.object@dr$tsne@cell.embeddings # the tSNE coordinates
singler$meta.data$clusters = seurat.object@ident # the Seurat clusters (if 'clusters' not provided)

# this example is of course if the previous analysis was performed with Seurat, but any other previous coordinates and clusters can be used.

save(singler,file='singler_object.RData')

All the parameters are similar to case 1.

Creating a new reference data set

We have a reference dataset that we want to use. It contains N samples that can be annotated to n1 main cell types (i.e. macrophages or DCs) and n2 cell states (i.e. alveolar macrophages, interstitial macrophages, pDCs and cDCs).

The gene expression data should be gene-length normalized (TPM, FPKM etc.) and in log2 scale. The rownames must be gene symbols.

This is how we define the reference object:

  name = 'My_reference'
  expr = as.matrix(expr) # the expression matrix
  types = as.character(types) # a character list of the types. Samples from the same type should have the same name.
  main_types = as.character(main_types) # a character list of the main types. 
  ref = list(name=name,data = expr, types=types, main_types=main_types)
  
  # if using the de method, we can predefine the variable genes
  ref$de.genes = CreateVariableGeneSet(expr,types,200)
  ref$de.genes.main = CreateVariableGeneSet(expr,main_types,300)
  
  # if using the sd method, we need to define an sd threshold
  sd = rowsSd(expr)
  sd.thres = sort(sd, decreasing = T)[4000] # or any other threshold
  ref$sd.thres = sd.thres
  
  save(ref,file='ref.RData') # it is best to name the object and the file with the same name.
  
  # we can then use this reference in the previous functions. Multiple references can used.
  singler = CreateSinglerObject(... ref.list = list(immgen, ref, mouse.rnaseq)

Using the object

There are examples in http://comphealth.ucsf.edu/sample-apps/SingleR/SingleR_specifications.html

We will soon add more simple examples with full analysis of datasets.