Contents

Theme:

6. Filters

Writing your own filter

Writing a custom filter is as easy as writing the rules into an empty text file and saving it with the ".fil" extension in the Output/<username>/<dataset> folder. This folder can easily be found through CellexalVR, open up the The settings menu and click the "Open output folder" button.

Basic syntax

A filter consists of a single boolean expression that explains your rules, for example:

gene:gata1 > 5% && facs:cd48 == 0

The above filter would select only cells that express gata1 more than 5% of the highest expression of gata1 in the data and that does not express cd48 at all. Note that facs measurements and gene expressions often have very many decimals and can be subject to rounding errors, so using the equal to or not equal to operators may not be a great idea. It often works out quite well when comparing to 0, but not so great for other numbers.

The grammar for writing a filter is that an expression consists of type:name operator [value] where type is one of gene, facs or attr, we'll explain more about attr in a bit. name is the name of your gene/facs/attribute and operator is any of the operators below. value is only used if type is gene or facs. Genes and facs measurements follow the same grammar because both are numerical data. For example gene:gata1 > 5% tells CellexalVR to look for a gene called gata1 and only select cells that express it by more than 5% of the maximum expression of gata1 in any cell. If you know your data well you may use absolute values instead of percentages. For example gene:gata1 > 0.5 would only select cells that express gata1 more than 0.5, as written in the gene expression database. Attributes are categorical data, a cell is either of an attribute or not, and may be of multiple attributes. Attributes are compared upon in filters using attr:name operator e.g. attr:lthsc yes. The only operators you may use are yes and no. Using the yes operator will allow you to only select cells that are part of an attribute and the no operator does the opposite.

Operators

The operators that you can use that are listed in the table below. Note that there may be multiple ways of writing some operators.

Symbol Meaning Applies to
=
==
Equal to Genes and Facs
!= Not equal to
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
yes Include Attributes
no Do not include

Multiple expressions

Expressions can be chained together to form more complicated expressions using the symbols below, listed in order of precedence from highest to lowest.

Symbol Meaning
( ... ) Paranthesis
! Not
^ Exclusive or
&
&&
And
|
||
Or

Note that the or operator is a pipe character and the exclusive or operator is a circumflex.
Parantheses are used around expressions to overcome the default level of precedence e.g.
(gene:gata1 > 5% || facs:cd48 > 0) && attr:lthsc yes
compares the results from the gata1 and cd48 expressions first and then compares the result to the lthsc attribute, while
gene:gata1 > 5% || facs:cd48 > 0 && attr:lthsc yes
compares cd48 and lthsc first because && has a higher precedence than ||.
The not operator is used to invert the result of an expression e.g.
! gene:gata1 > 5%
is the same as writing
gene:gata1 <= 5%
Most importantly it can be used before a paranthesis to invert the result of multiple expressions.
The exclusive or operator takes the results from two expressions and checks if they are different. For example
gene:gata1 > 5% ^ facs:cd48 > 0
will select cells that either express gata1 more than 5% or express cd48 more than 0, but not cells that express both.
The and operator compares the results of two expressions and checks if both are true.
gene:gata1 > 5% && facs:cd48 > 0
will select cells that express both gata1 and cd48 above their respective thresholds.
The or operator compares the result of two expressions and checks if at least one is true.
gene:gata1 > 5% || facs:cd48 > 0
will select cells that either express gata1 or cd48 or both above their respective thresholds.

Aliases

The last feature of filters are aliases, if you plan on using a gene/facs/attribute name many times in a filter you can give it a nickname by typing
alias:alias_name = type:real_name
where alias_name is the nickname you want to assign the gene/facs/attribute, type is one of gene or facs or attr and real_name is the real name of the gene/facs/attribute. For example
alias:hp = attr:celltype@Haematoendothelial.progenitors means you only have to type hp whenever you want to refer to attr:celltype@Haematoendothelial.progenitors.
This feature makes the filters

attr:celltype@Haematoendothelial.progenitors yes && gene:gata1 > 5% || attr:celltype@Haematoendothelial.progenitors no && gene:gata1 > 1%

Note: this is all on one line, but most browsers might display it on several due to the length.

and

alias:hp = attr:celltype@Haematoendothelial.progenitors
alias:g = gene:gata1
hp yes && g > 5% || hp no && g > 1%

equivalent.
Note that aliases must be written before the line that describes the rules of the filter. You can have as many aliases as you like. You do not have to assign aliases to all your included genes/facs/attributes, it is completely optional.

Some additional notes on filters

  • The filters' names in the menu are their file names, so name your filters informatively. Or don't, I'm a manual not a police officer.
  • You do not need to type in all lowercase like I did in the filter files, all upercase is fine as well, or any combination of the two. Gene names and such are never case sensitive.
  • You may have any number of filters and a filter may include any number of expressions.
  • Filters that are not parsed properly by CellexalVR will be written to the log (see the logging section for more information).
  • Filters that are edited when CellexalVR is running will be updated when the filter file is saved and the changes will take effect immediately. This will not affect cells that you have previously selected with that filter, only new ones that you select after the change is made. There is no need to restart the program.
  • Decimals signs should be written with a period . and not a comma ,.
  • Names of genes, facs, attributes and aliases may contain any characters except the following: & | ^ ! . ( ) = : %. Names must also not include any whitespace of any kind or linebreaks.
  • You do not need to put spaces around operators like I did in the examples, except before the yes and no operators. However, I encourage you do so as it makes writing and reading filters easier.
  • Comments can be written in the filter files by starting a line with a number sign #.
  • Only a maximum of one filter may be active at any time. Activating another will disable your previous filter.
  • Defining a filter with many genes may use up a lot of memory and may perform slow.