This document describes how to use cellexalvrR
, an R package that accompanies CellexalVR which is a virtual reality environment to analyze single-cell RNAseq data. cellexalvrR
has two functions:
This document will focus on the first point.
The easiest way to install cellexalvrR
is directly from github using the devtools
package:
library(devtools)
install_github("sonejilab/cellexalvrR")
If you are installing this on your VR workstation then make sure you install cellexalvrR
system-wide. Right-click the R icon, and then “Run as administrator” before issuing the commands above. Also make sure that you also have the Rtools installed on your Windows machine which provides C and C++ compilers. The VR log system will in addition need pandoc.
For this quick start we are relying mostly on published example data from other packages. Several of these example data sets have only 2D dimension reduction data and therfore we provide slightly changed files (+3D drcs) on our own web page.
For the sake of this vignette we set the OutPath to the CellexalVR Data path so that these example dataset become available in your VR session and the InPath to the folder we downloaded the example data to (if necessary).
## Tell R where your CellexalVR "Data"" folder is.
## The folder should be in the path where you unpacked CellexalVR.
## This one is mine:
= "i:/CellexalVR/CellexalVR_0.14.0_build/Data/"
OutPath ## I have downloaded the data files into the R tempPath
= tempdir() InPath
We have a simple function to convert a Seurat object to a cellexalvrR object prior to export. To demonstrate this we will use the the Seurat pbmc_small example data.
library(Seurat)
library(cellexalvrR)
## We will now calculate a UMAP in 3 dimensions whereas the TSNE will be using in 2D
<- RunUMAP(pbmc_small, dims = 1:10,n.components = 3)
pbmc_small #> 13:04:44 UMAP embedding parameters a = 0.9922 b = 1.112
#> 13:04:44 Read 80 rows and found 10 numeric columns
#> 13:04:44 Using Annoy for neighbor search, n_neighbors = 30
#> 13:04:44 Building Annoy index with metric = cosine, n_trees = 50
#> 0% 10 20 30 40 50 60 70 80 90 100%
#> [----|----|----|----|----|----|----|----|----|----|
#> **************************************************|
#> 13:04:44 Writing NN index file to temp file C:\Users\Stefan\AppData\Local\Temp\RtmpQXLt8D\file2d247d2b651a
#> 13:04:44 Searching Annoy index using 1 thread, search_k = 3000
#> 13:04:44 Annoy recall = 100%
#> 13:04:45 Commencing smooth kNN distance calibration using 1 thread
#> 13:04:45 7 smooth knn distance failures
#> 13:04:46 Initializing from normalized Laplacian + noise
#> 13:04:46 Commencing optimization for 500 epochs, with 2410 positive edges
#> 13:04:47 Optimization finished
Embeddings(pbmc_small,"umap")[1,]
#> UMAP_1 UMAP_2 UMAP_3
#> -1.912597 -1.099849 1.826431
## Now we convert the object to a cellexalvr object using
## the cell identity and cluster as metadata for the cells (and any others you wish):
<- as_cellexalvrR(pbmc_small,c("orig.ident","groups"), specie="human" )
cvr
cvr#> An object of class cellexalvrR
#> with 230 genes and 80 cells.
#> Annotation datasets (230,1): 'Gene Symbol'
#> Sample annotation (80,3): 'orig.ident@SeuratProject', 'groups@g2', 'groups@g1'
#> There are 0 user group(s) stored :
#> and 3 drc object(s)
#> Specie is set to NA
## And here we export all data
= file.path(OutPath, "PBMC")
opath export2cellexalvr(cvr, opath)
#> Warning in export2cellexalvr(cvr, opath): path does not exists - creating it now
#> Warning in check(x): meta.gene rownames set to data rownames
#> object passes checks
list.files(opath)
#> [1] "a.meta.cell" "c.meta.gene" "cellexalObj.RData"
#> [4] "database.sqlite" "pca.mds" "tsne.mds"
#> [7] "umap.mds"
## Done!
If you point the OutPath to your CellexalVR Data path you will now have the Seurat pbmc_small data set in your VR environment.
It will then be available for loading when you step into CellexalVR. This applies to all other data sets described here, too.
We will get the 3k PBMC h5ad file from here and unzip. It was created using this Scanpy tutorial where the only change we made was to embed the UMAP into 3 dims using sc.tl.umap(adata,n_components=3)
:
library(cellexalvrR)
<- as_cellexalvrR( file.path( InPath, "pbmc3k.h5ad"),
cvr meta.cell.groups = c( "leiden") , embeddingDims = 3,
specie = "human", velocity = NULL )
#> reading expression data
cvr#> An object of class cellexalvrR
#> with 1838 genes and 2638 cells.
#> Annotation datasets (1838,14): '_index', 'dispersions', 'dispersions_norm', 'gene_ids', 'highly_variable', 'mean', 'mean_counts', 'means', 'mt', 'n_cells', 'n_cells_by_counts', 'pct_dropout_by_counts', 'std', 'total_counts'
#> Sample annotation (2638,8): 'leiden@0', 'leiden@2', 'leiden@1', 'leiden@4', 'leiden@3', 'leiden@5', 'leiden@6', 'leiden@7'
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to human
= file.path(OutPath,"PBMC3K")
opath export2cellexalvr(cvr, opath)
#> Warning in export2cellexalvr(cvr, opath): path does not exists - creating it now
#> object passes checks
list.files(opath)
#> [1] "a.meta.cell" "c.meta.gene" "cellexalObj.RData"
#> [4] "database.sqlite" "pca.mds" "umap.mds"
## Done!
If you have used scvelo to caculate cell velocities then only a couple more options need to be added to the export. In this example we ran this scvelo demo with two extra lines:
sc.tl.umap(adata,n_components=3) # embed to 3 dims
sc.tl.leiden(adata) #get clusters
We can get a zip of the resulting h5ad file from here. The rest is almost the same as above (in R):
library(cellexalvrR)
<- as_cellexalvrR(file.path( InPath,"pancreas_scvelo.h5ad"),
cvr c("clusters","leiden"), specie="human",velocity="scvelo",
scaleArrowTravel=30)
#> reading expression data
#> WARNING: drc pca - velocity information not available
## The deltas can be quite small, so in the line above we have 'scaleArrowTravel'
## which multiples delta by factor specified to increase the path of travel for
## each cell to make them more visible.
## Now we create the folder and export the CellexalVR files
= file.path( OutPath,"Pancreas_scvelo")
opath export2cellexalvr(cvr, opath)
#> Warning in export2cellexalvr(cvr, opath): path does not exists - creating it now
#> object passes checks
list.files(opath)
#> [1] "a.meta.cell" "c.meta.gene" "cellexalObj.RData"
#> [4] "database.sqlite" "pca.mds" "umap.mds"
## Done!
In general, to enable velocity visualisation in CellexalVR one supplies 6-column .mds files, where the last three columns denote the destination x/y/z coordinate for each cell
We followed this Seurat example, again embedding to 3 dimensions as shown above. A zip of the RDS file can be obtained here which is a seurat object. Exporting the data:
library(Seurat)
library(cellexalvrR)
<- readRDS(file.path( InPath,"pbmc.atac.rds"))
pbmc.atac
## convert the object using the gene activity assay
<- as_cellexalvrR(pbmc.atac,c("seurat_clusters"),specie="human",assay="ACTIVITY")
cvr
cvr#> An object of class cellexalvrR
#> with 18969 genes and 7866 cells.
#> Annotation datasets (18969,1): 'Gene Symbol'
#> Sample annotation (7866,17): 'seurat_clusters@0', 'seurat_clusters@2', 'seurat_clusters@7', 'seurat_clusters@14', 'seurat_clusters@13', 'seurat_clusters@1', 'seurat_clusters@3', 'seurat_clusters@6', 'seurat_clusters@5', 'seurat_clusters@8', 'seurat_clusters@4', 'seurat_clusters@12', 'seurat_clusters@10', 'seurat_clusters@11', 'seurat_clusters@9', 'seurat_clusters@16', 'seurat_clusters@15'
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA
## We're ready to export.
= file.path(OutPath,"scATAC")
opath export2cellexalvr(cvr, opath)
#> Warning in export2cellexalvr(cvr, opath): path does not exists - creating it now
#> Warning in check(x): meta.gene rownames set to data rownames
#> object passes checks
list.files(opath)
#> [1] "a.meta.cell" "c.meta.gene" "cellexalObj.RData"
#> [4] "database.sqlite" "lsi.mds" "umap.mds"
#Done!
If you aren’t using Seurat, Scanpy or Scvelo the following guide will show you how to make a cellexalVR oject from scratch using our in-built functions. The requirments are very simple and examples are shown below.
The data from Nestorowa et al can be downloaded from here. Unpack them and set your working directory to where they are.
First, load the library:
library(cellexalvrR)
Then load the data:
load("exdata.RData")
load("facs.RData")
load("cell.ids.RData")
load("diff.proj.RData")
load("ddr.proj.RData")
load("tsne.proj.RData")
exdata
is a sparse matrix of highly variable genes in log2 form. The first 10 columns and 10 rows look as so:
1:6,1:6]
exdata[#> 6 x 6 sparse Matrix of class "dgCMatrix"
#> HSPC_001 HSPC_002 HSPC_003 HSPC_004 HSPC_006 HSPC_008
#> Clec1b . . . . 1.039889 1.172368
#> Kdm3a . . . . . 8.313856
#> Coro2b 9.475577 . . . 1.637906 1.172368
#> 8430408G22Rik . . . . . .
#> Clec9a . . . . . .
#> Phf6 7.733370 7.271829 9.413041 2.677008 5.832172 7.856124
Note the cell IDs are in the column names, and the gene names are the row names.
class(exdata)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
facs
is a matrix of cell surface marker intensities captured during index sorting:
1:6,1:6]
facs[#> CD34 CD16 c.Kit EPCR Flk2 CD150
#> HSPC_001 0.6037796 0.4322826 1.3509535 1.5235524 0.3618872 0.3036440
#> HSPC_002 0.3652949 0.5923082 1.1668190 1.5575771 0.3985330 0.1802330
#> HSPC_003 1.1688350 0.3103070 1.2716602 0.1796609 0.2225457 0.2708689
#> HSPC_004 0.7234537 0.5098244 1.2890254 0.1047645 0.7382823 0.1746147
#> HSPC_006 0.7806266 0.2244971 0.8613574 0.1909399 0.7124364 0.4767371
#> HSPC_008 0.8303475 0.5984071 1.2414646 1.4069329 0.4826786 0.2322792
Cell IDs are in the row names, and the name of the surface protein in the columns.
cell.ids
is a 1/0 matrix that assigns metadata to the each cell. In this case it shows the cell type each cell was sorted as:
1:6,1:6]
cell.ids[#> LTHSC_broad LMPP_broad MPP_broad CMP_broad MEP_broad GMP_broad
#> HSPC_001 0 0 1 0 0 0
#> HSPC_002 1 0 0 0 0 0
#> HSPC_003 0 0 1 0 0 0
#> HSPC_004 0 1 0 0 0 0
#> HSPC_006 0 1 0 0 0 0
#> HSPC_008 0 1 0 0 0 0
Cell IDs are in the row names, and the name of the meta data in the columns. A 1 is given if the cell belongs to a metadata class, 0 otherwise.
diff.proj
,tsne.proj
,and diff.proj
are the results from three different dimension reduction methods applied to exdata
, specifically DDRTree, tSNE and diffusion map respectively. Each is a three column matrix of x/y/z coordinates. For example:
head(diff.proj)
#> DC1 DC2 DC3
#> HSPC_001 -0.014760110 -0.024119177 -0.007998293
#> HSPC_002 -0.010255308 -0.019970919 -0.015754327
#> HSPC_003 -0.003649320 0.027297654 0.014202316
#> HSPC_004 -0.012653205 -0.002990825 0.025813077
#> HSPC_006 -0.005876981 0.021537658 0.035740656
#> HSPC_008 -0.015699148 -0.018777106 0.007916241
The first step is to put all the dimension reduction outputs into a single list:
<- list(diffusion=diff.proj,DDRtree=ddr.proj,tSNE=tsne.proj) proj.list
The next is to make a cellexalvr
object by calling MakeCellexaVRObj
and passing the required objects to it:
<- MakeCellexalVRObj(exdata,drc.list=proj.list,specie="mouse",
cellvr cell.meta=cell.ids,facs.data=facs)
In the same step we also set the specie as mouse which ensures the correct transcription factor IDs are using during network construction if implemented in CellexalVR.
Calling the object name will detail it’s contents:
cellvr#> An object of class cellexalvrR
#> with 4709 genes and 1654 cells.
#> Annotation datasets (1,1): ''
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'
#> There are 0 user group(s) stored :
#> and 3 drc object(s)
#> Specie is set to mouse
The last step is to call export2cellexalvr
which will write the neccessary files from the cellvr
object to a specified directory:
= file.path(OutPath,"CellexalOut")
opath export2cellexalvr(cvr, opath)
#> Warning in export2cellexalvr(cvr, opath): path does not exists - creating it now
#> Warning in check(x): meta.gene rownames set to data rownames
#> object passes checks
list.files( opath )
#> [1] "a.meta.cell" "c.meta.gene" "cellexalObj.RData"
#> [4] "database.sqlite" "lsi.mds" "umap.mds"
## Done!
All the files needed by CellexalVR are created at this point. The entire “CellexalOut/” folder should then be moved/copied to the “Data” folder in your CellexalVR setup. See the manual for more details including a schematic of the folder structure.
cellexalvr
object from scratchWhile MakeCellexalVRObj
is the most convenient way to make the object, sometimes you want to make one manually. This is done calling new
:
<- new("cellexalvr",data=Matrix::Matrix(exdata,sparse=T),drc=list(tsne=tsne.proj))
cell.vr.scr
cell.vr.scr#> An object of class cellexalvr
#> with 4709 genes and 1654 cells.
#> Annotation datasets (1,1): ''
#> Sample annotation (1,1): ''
#> There are 0 user group(s) stored :
#> and 1 drc object(s)
#> Specie is set to NA
We can add another set of dimension reduction coordinates using the addMDS2cellexalvr
function:
<- addDRC2cellexalvr(cell.vr.scr,ddr.proj,"DDRTree")
cell.vr.scr
cell.vr.scr#> An object of class cellexalvr
#> with 4709 genes and 1654 cells.
#> Annotation datasets (1,1): ''
#> Sample annotation (1,1): ''
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA
To add metadata for each cell use addCellMeta2cellexalvr
:
<- addCellMeta2cellexalvr(cell.vr.scr,cell.ids)
cell.vr.scr
cell.vr.scr#> An object of class cellexalvr
#> with 4709 genes and 1654 cells.
#> Annotation datasets (1,1): ''
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to NA
To add FACS for each cell use addFACS2cellexalvr
:
<- addFACS2cellexalvr(cell.vr.scr,facs) cell.vr.scr
Setting the specie is done using the set.specie
function:
<- set.specie(cell.vr.scr,"mouse")
cell.vr.scr
cell.vr.scr#> An object of class cellexalvr
#> with 4709 genes and 1654 cells.
#> Annotation datasets (1,1): ''
#> Sample annotation (1654,23): 'LTHSC_broad', 'LMPP_broad', 'MPP_broad', 'CMP_broad', 'MEP_broad', 'GMP_broad', 'MPP1_broad', 'MPP2_broad', 'MPP3_broad', 'STHSC_broad', 'LTHSC', 'LMPP', 'MPP', 'CMP', 'MEP', 'GMP', 'MPP1', 'MPP2', 'MPP3', 'STHSC', 'ESLAM', 'HSC1', 'Projected'
#> There are 0 user group(s) stored :
#> and 2 drc object(s)
#> Specie is set to mouse
CellexalVR requires metadata in the form of a 1/0 matrix, but many packages store it as a data frame. CellexalvrR has function to convert a data frame into a 1/0 matrix. First, lets make a data frame:
<- data.frame(CellType=sample(c("Type1","Type2","Type3"),10,replace=T),
meta.df Phase=sample(c("G1","G2M","S"),10,replace=T),
Treatment=sample(c("WT","Dox"),10,replace=T))
head(meta.df)
#> CellType Phase Treatment
#> 1 Type1 G2M WT
#> 2 Type3 S Dox
#> 3 Type2 G2M Dox
#> 4 Type1 G2M Dox
#> 5 Type3 G1 WT
#> 6 Type1 G2M Dox
We can now make a correctly formatted cell metadata matrix by applying the function make.cell.meta.from.df
using only the fields we need, in this case the “CellType” and “Treatment” columns:
<- make.cell.meta.from.df(meta.df,c("CellType","Treatment"))
required.cell.metad head(required.cell.metad)
#> CellType@Type1 CellType@Type3 CellType@Type2 Treatment@WT Treatment@Dox
#> 1 1 0 0 1 0
#> 2 0 1 0 0 1
#> 3 0 0 1 0 1
#> 4 1 0 0 0 1
#> 5 0 1 0 1 0
#> 6 1 0 0 0 1
It can be seen the field name is placed in the column heading preceeding a “@”, and this is used by CellexalVR to form catagories on the menu system, so “CellType” and “Treatment” will be on separate tabs. This metadata matrix can now be added to a cellexalvrR
object as decribed above using the addCellMeta2cellexalvr
function.