A guide to using cellexalvrR

Shamit Soneji

2019-10-09

Introduction

This document describes how to use cellexalvrR, an R package that accompanies CellexalVR which is a virtual reality environment to analyze single-cell RNAseq data. cellexalvrR has two functions:

  1. To aid the formatting and export of data that can be imported by CellexalVR.
  2. To perform backend calculations during a CellexalVR session.

Installation

The easiest way to install cellexalvrR is directly from github using the devtools package:

library(devtools)
install_github("sonejilab/cellexalvrR")

If you are installing this on your VR station then be sure to install cellexalvrR system-wide, so on your Windows 10 machine right-click the R icon, and then “Run as administrator” before issuing the commands above.

Make sure that you also have the Rtools installed on your Windows machine which provides c and c++ compilers.

Quick start

The data from Nestorowa et al can be downloaded from here. Unpack them and set your working directory to where they are.

First, load the library:

library(cellexalvrR)
#> Loading required package: Matrix
#> 
#> 
#> 

Then load the data:

load("exdata.RData")
load("facs.RData")
load("cell.ids.RData")
load("diff.proj.RData")
load("ddr.proj.RData")
load("tsne.proj.RData")

exdata is a sparse matrix of highly variable genes in log2 form. The first 10 columns and 10 rows look as so:

exdata[1:10,1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#>    [[ suppressing 10 column names 'HSPC_001', 'HSPC_002', 'HSPC_003' ... ]]
#>                                                                       
#> Clec1b         .         .        .         .        1.039889 1.172368
#> Kdm3a          .         .        .         .        .        8.313856
#> Coro2b         9.4755769 .        .         .        1.637906 1.172368
#> 8430408G22Rik  .         .        .         .        .        .       
#> Clec9a         .         .        .         .        .        .       
#> Phf6           7.7333697 7.271829 9.4130405 2.677008 5.832172 7.856124
#> Usp14          1.4788998 3.378561 8.6774851 1.659063 7.391105 3.091340
#> Tmem167b       .         9.093340 0.8302297 .        1.637906 .       
#> Kbtbd7        10.0456005 2.861341 1.3538522 1.659063 2.385231 7.902221
#> Rag2           0.5329056 2.047347 0.8302297 .        6.381515 .       
#>                                                  
#> Clec1b        .        .        1.486056 .       
#> Kdm3a         7.042431 2.732774 2.202400 2.209467
#> Coro2b        .        1.090491 .        .       
#> 8430408G22Rik .        .        .        .       
#> Clec9a        .        .        .        .       
#> Phf6          3.298544 9.103079 6.695850 1.491864
#> Usp14         8.689676 7.566973 2.202400 8.129944
#> Tmem167b      .        .        1.486056 1.491864
#> Kbtbd7        .        .        .        .       
#> Rag2          .        1.704399 7.365545 1.491864

Note the cell IDs are in the column names, and the gene names are the row names.

class(exdata)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"

facs is a matrix of cell surface marker intensities captured during index sorting:

head(facs)
#>               CD34      CD16     c.Kit      EPCR      Flk2     CD150
#> HSPC_001 0.6037796 0.4322826 1.3509535 1.5235524 0.3618872 0.3036440
#> HSPC_002 0.3652949 0.5923082 1.1668190 1.5575771 0.3985330 0.1802330
#> HSPC_003 1.1688350 0.3103070 1.2716602 0.1796609 0.2225457 0.2708689
#> HSPC_004 0.7234537 0.5098244 1.2890254 0.1047645 0.7382823 0.1746147
#> HSPC_006 0.7806266 0.2244971 0.8613574 0.1909399 0.7124364 0.4767371
#> HSPC_008 0.8303475 0.5984071 1.2414646 1.4069329 0.4826786 0.2322792
#>               CD48       Lin      Sca1
#> HSPC_001 0.4359241 0.2989843 1.3179900
#> HSPC_002 0.3413969 0.4922320 1.5137737
#> HSPC_003 1.4005640 0.4194964 0.6391425
#> HSPC_004 1.0903352 0.5070497 0.5363241
#> HSPC_006 1.1907472 0.3140937 0.5699483
#> HSPC_008 0.3781136 0.2573233 1.2604954

Cell IDs are in the row names, and the name of the surface protein in the columns.

cell.ids is a 1/0 matrix that assigns metadata to the each cell. In this case it shows the cell type each cell was sorted as:

head(cell.ids)
#>          LTHSC_broad LMPP_broad MPP_broad CMP_broad MEP_broad GMP_broad
#> HSPC_001           0          0         1         0         0         0
#> HSPC_002           1          0         0         0         0         0
#> HSPC_003           0          0         1         0         0         0
#> HSPC_004           0          1         0         0         0         0
#> HSPC_006           0          1         0         0         0         0
#> HSPC_008           0          1         0         0         0         0
#>          MPP1_broad MPP2_broad MPP3_broad STHSC_broad LTHSC LMPP MPP CMP
#> HSPC_001          0          0          0           1     0    0   0   0
#> HSPC_002          0          0          0           0     1    0   0   0
#> HSPC_003          0          0          1           0     0    0   0   0
#> HSPC_004          0          0          0           0     0    1   0   0
#> HSPC_006          0          0          0           0     0    1   0   0
#> HSPC_008          0          0          0           0     0    1   0   0
#>          MEP GMP MPP1 MPP2 MPP3 STHSC ESLAM HSC1 Projected
#> HSPC_001   0   0    0    0    0     0     0    0         0
#> HSPC_002   0   0    0    0    0     0     0    0         0
#> HSPC_003   0   0    0    0    0     0     0    0         0
#> HSPC_004   0   0    0    0    0     0     0    0         0
#> HSPC_006   0   0    0    0    0     0     0    0         0
#> HSPC_008   0   0    0    0    0     0     0    0         0

Cell IDs are in the row names, and the name of the meta data in the columns. A 1 is given if the cell belongs to a metadata class, 0 otherwise.

diff.proj,tsne.proj,and diff.proj are the results from three different dimension reduction methods applied to exdata, specifically DDRTree, tSNE and diffusion map respectively. Each is a three column matrix of x/y/z coordinates. For example:

head(diff.proj)
#>                   DC1          DC2          DC3
#> HSPC_001 -0.014760110 -0.024119177 -0.007998293
#> HSPC_002 -0.010255308 -0.019970919 -0.015754327
#> HSPC_003 -0.003649320  0.027297654  0.014202316
#> HSPC_004 -0.012653205 -0.002990825  0.025813077
#> HSPC_006 -0.005876981  0.021537658  0.035740656
#> HSPC_008 -0.015699148 -0.018777106  0.007916241

The first step is to put all the dimension reduction outputs into a single list:

proj.list <- list(diffusion=diff.proj,DDRtree=ddr.proj,tSNE=tsne.proj)

The next is to make a cellexalvr object by calling MakeCellexalVRObj and passing the required objects to it:

cellvr <- MakeCellexalVRObj(exdata,drc.list=proj.list,specie="mouse",cell.meta=cell.ids,facs.data=facs)

In the same step we also set the specie as mouse which ensures the correct transcription factor IDs are using during network construction if implemented in CellexalVR.

Calling the object name will detail it’s contents:

cellvr
#> An object of class cellexalvrR 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 3 drc object(s)
#> Specie is set to mouse

The last step is to call export2cellexalvr which will write the neccessary files from the cellvr object to a specified directory:

export2cellexalvr(cellvr,"CellexalOut/")

Make sure that this directory exists.

All the files needed by CellexalVR are created at this point. The entire “CellexalOut/” folder should then be moved/copied to the “Data” folder in your CellexalVR setup. See the manual for more details including a schematic of the folder structure.

Making a cellexalvr object from scratch

While MakeCellexalVRObj is the most convenient way to make the object, sometimes you want to make one manually. This is done calling new:

cell.vr.scr <- new("cellexalvr",data=Matrix::Matrix(exdata,sparse=T),drc=list(tsne=tsne.proj))
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1,1): ''   
#> There are 0 user group(s) stored :
#> and 1 drc object(s)
#> Specie is set to NA

We can add another set of dimension reduction coordinates using the addMDS2cellexalvr function:

cell.vr.scr <- addDRC2cellexalvr(cell.vr.scr,ddr.proj,"DDRTree")
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1,1): ''   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA

To add metadata for each cell use addCellMeta2cellexalvr:

cell.vr.scr <- addCellMeta2cellexalvr(cell.vr.scr,cell.ids)
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA

To add FACS for each cell use addFACS2cellexalvr:

cell.vr.scr <- addFACS2cellexalvr(cell.vr.scr,facs)

Setting the specie is done using the set.specie function:

cell.vr.scr <- set.specie(cell.vr.scr,"mouse")
cell.vr.scr
#> An object of class cellexalvr 
#> with 4709 genes and 1654  cells. 
#> Annotation datasets (1,1): ''   
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'   
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to mouse

Making cell metadata from a data frame

CellexalVR requires metadata in the form of a 1/0 matrix, but many packages store it as a data frame. CellexalvrR has function to convert a data frame into a 1/0 matrix. First, lets make a data frame:

meta.df <- data.frame(CellType=sample(c("Type1","Type2","Type3"),10,replace=T),
                      Phase=sample(c("G1","G2M","S"),10,replace=T),
                      Treatment=sample(c("WT","Dox"),10,replace=T))
head(meta.df)
#>   CellType Phase Treatment
#> 1    Type3   G2M        WT
#> 2    Type3    G1        WT
#> 3    Type1    G1       Dox
#> 4    Type1     S        WT
#> 5    Type2    G1       Dox
#> 6    Type2     S        WT

We can now make a correctly formatted cell metadata matrix by applying the function make.cell.meta.from.df using only the fields we need, in this case the “CellType” and “Treatment” columns:

required.cell.metad <- make.cell.meta.from.df(meta.df,c("CellType","Treatment"))
head(required.cell.metad)
#>      CellType@Type3 CellType@Type1 CellType@Type2 Treatment@WT
#> [1,]              1              0              0            1
#> [2,]              1              0              0            1
#> [3,]              0              1              0            0
#> [4,]              0              1              0            1
#> [5,]              0              0              1            0
#> [6,]              0              0              1            1
#>      Treatment@Dox
#> [1,]             0
#> [2,]             0
#> [3,]             1
#> [4,]             0
#> [5,]             1
#> [6,]             0

It can be seen the field name is placed in the column heading preceeding a “@”, and this is used by CellexalVR to form catagories on the menu system, so “CellType” and “Treatment” will be on separate tabs. This metadata matrix can now be added to a cellexalvrR object as decribed above using the addCellMeta2cellexalvr function.

Converting a Seurat object to a cellexalvr object

We will show you how to make a cellexalvrR object from a Seurat 3 session. In this example we have followed the tutorial for processing the Mouse Cell Atlas data found here.

In this case we ran UMAP, but making sure we embedded it to three dimensions:

mca <- RunUMAP(mca, dims = 1:75, min.dist = 0.75,n.components=3)

We can then make the cellexalvrR object:

mca.data <- GetAssayData(object = mca) #Extract the expression data
drl <- list(UMAP=Embeddings(object = mca, reduction = "umap")) # Put the UMAP coordinated into a list
meta <- make.cell.meta.from.df(mca[[]],"Tissue") # Make the metadata using just the "Tissue" column

cvr <- new("cellexalvrR",data=mca.data,drc=drl) # Initialise a new cellexalvrR object with the expression data and UMAP
cvr <- set.specie(cvr,"mouse") # Set the specie to Mouse
cvr <- addCellMeta2cellexalvr(cvr,meta) # Add the metadata to the cellexalvrR object
export2cellexalvr(cvr,"MCA_full") #Export the files to a folder called "MCA_full" 

Making and adding a 3D RNA Velocity plot for CellexalVR

The following code is taken from the Seurat page describing their wrapper function to create velocity trajectories from an existing embedding which can be seen here. We have duplicated the code below highlighting where changes have been made in the comments.

library(Seurat)
library(velocyto.R)
library(SeuratWrappers)
library(cellexalvrR)
# If you don't have velocyto's example mouse bone marrow dataset, download with the CURL command
# curl::curl_download(url = 'http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom', destfile
# = '~/Downloads/SCG71.loom')
ldat <- ReadVelocity(file = "~/Downloads/SCG71.loom")
bm <- as.Seurat(x = ldat)
bm <- SCTransform(object = bm, assay = "spliced")
bm <- RunPCA(object = bm, verbose = FALSE)
bm <- FindNeighbors(object = bm, dims = 1:20)
bm <- FindClusters(object = bm)

# Note the change in the following line!
bm <- RunUMAP(object = bm, dims = 1:20,n.components=3) # Project the UMAP onto 3 dimensions instead of 2 by adding n.components=3

bm <- RunVelocity(object = bm, deltaT = 1, kCells = 25, fit.quantile = 0.02)
ident.colors <- (scales::hue_pal())(n = length(x = levels(x = bm)))
names(x = ident.colors) <- levels(x = bm)
cell.colors <- ident.colors[Idents(object = bm)]
names(x = cell.colors) <- colnames(x = bm)


# Two changes in the following command!
vel.out <- show.velocity.on.embedding.cor(emb = Embeddings(object = bm, reduction = "umap"), vel = Tool(object = bm, slot = "RunVelocity"), n = 200, scale = "sqrt", cell.colors = ac(x = cell.colors, alpha = 0.5), cex = 0.8, arrow.scale = 3, show.grid.flow = TRUE, min.grid.cell.mass = 0.5, grid.n = 40, arrow.lwd = 1, do.par = FALSE, cell.border.alpha = 0.1,return.details=T) # Add the option return.details=T so the coordinates that define the arrows are returned into vel.out


vel.emb <- vel.out$arrows #pull out the coordinates defining the arrows
colnames(vel.emb) <- c("dr1","dr2","dr3","vv1","vv2","vv3") # correct the column headings as Velocity.R assumes embedding is always done in 2 dimensions.

#From here we make the CellexalVR object just as we did above
bm.data <- GetAssayData(object = bm)[,rownames(vel.emb)] #Extract the expression data for the cells used for the velocity run

drl <- list(UMAP=Embeddings(object = bm, reduction = "umap")) # Put the UMAP coordinated into a list

bm.meta <- make.cell.meta.from.df(bm[[]],"seurat_clusters") # Make the metadata using just the "seurat_clusters" column


cvr <- new("cellexalvrR",data=bm.data,drc=drl) # Initialise a new cellexalvrR object with the expression data and UMAP

cvr <- set.specie(cvr,"mouse") # Set the specie to Mouse

cvr <- addCellMeta2cellexalvr(cvr,bm.meta) # Add the metadata to the cellexalvrR object

cvr <- addVelocityToExistingDR(cvr,vel.emb,"UMAP") # This function will add the last 3 columns to an existing embedding, in this case the UMAP

#If you do NOT have an existing DR plot to add to then use:

cvr <- addNewVelocity(cvr,vel.emb,"NewVel") # The plot will appear in  CellexalVR names "NewVel"

export2cellexalvr(cvr,"Velo_Test") #Export the files to a folder called "Velo_Test"