Contents

Theme:

9. Understanding the file formats

The gene expressions

The database.sqlite is, as the name suggests, an sqlite database. It holds all the non-zero gene expression values as a sparse matrix would. The database contains three tables. The genes table contains gene names and gene ids, the cells table holds cell names and cell ids, and datavalues holds gene ids, cell ids and gene expression values. The CREATE statements that populates the database are below.

CREATE TABLE datavalues ('gene_id' REAL, 'cell_id' REAL, 'value' REAL);
CREATE TABLE genes ('id' INTEGER NOT NULL UNIQUE,'gname' varchar(20) COLLATE NOCASE)
CREATE TABLE cells ('id' INTEGER NOT NULL UNIQUE,'cname' varchar(20) COLLATE NOCASE)
CREATE UNIQUE INDEX gnameIDX ON genes (gname)
CREATE UNIQUE INDEX cnameIDX ON cells (cname)
CREATE INDEX gene_id_data ON datavalues ('gene_id')
CREATE INDEX cell_id_data ON datavalues ('cell_id')