SanityR

Bayesian Inference and Distance Calculation for Single-Cell RNA-seq Data

SanityR provides an R interface to the Sanity model, described in Breda et al. (2021), Nature Biotechnology for single-cell gene expression analysis. It offers tools for:

Bayesian estimation of log normalized counts and their uncertainty.
Computing statistically sound distances between cells while accounting for uncertainty.
Integrates with SingleCellExperiment to be used as part of the Bioconductor Single Cell Workflow

Installation

GitHub installation

remotes::install_github("TeoSakel/SanityR")

Example

library(SanityR)

# Simulate data
sce <- simulate_branched_random_walk(N_path = 10, length_path = 10, N_gene = 200)

# Run Sanity estimation
sce <- Sanity(sce)

# Compute distances
dist <- calculateSanityDistance(sce)

# Perform clustering or visualization
plot(hclust(dist))

Main Functions

`Sanity()`

sce <- Sanity(sce)
logcounts(sce)

Log-normalizes the UMI counts and estimates error bars for each value using a hierarchical Bayesian Model:

$$ \begin{aligned} n_{gc} &\sim \text{Poisson}(\lambda_c \alpha_g e^{\delta_{gc}}) \\ \lambda_c &\sim \text{Uniform}(0, \infty) \\ \alpha_g &\sim \text{Gamma}(a, b) \\ \delta_{gc} &\sim \text{Normal}(0, v_g) \\ \end{aligned} $$

where:

$n_{gc}$ is the observed UMI count for gene $g$ in cell $c$.
$\lambda_c$ is the cell-specific transcription rate.
$\alpha_g$ is the mean activity quotient of the gene $g$.
$a$ and $b$ are prior hyperparameters for the Gamma distribution.
$\delta_{gc}$ is the log fold-change of activity for gene $g$ in cell $c$ versus the mean.
$v_g$ is the prior variance of the log fold-change for gene $g$.

Log-normalized counts in this model are calculated as:

$$y_{gc} = \langle \log{\alpha_g} + \delta_{gc} \rangle$$

`calculateSanityDistance()`

dist <- calculateSanityDistance(sce)

Computes the expected Euclidean distance between cells, accounting for measurement uncertainty:

$$ \begin{aligned} d_{cc'} &= \sqrt{\sum_g \langle \delta_{gc} - \delta_{gc'} \rangle^2} \\ \delta_{gc} - \delta_{gc'} &\sim \text{Normal}(\Delta_g, \eta_g) \\ \Delta_g &\sim \text{Normal}(0, \alpha v_g) \\ \end{aligned} $$

where:

$d$ is the distance between cells $c$ and $c'$
$\delta_{gc}$ is the log fold-change of activity for gene $g$ in cell $c$ computed by Sanity.
$\eta_g = \epsilon_{gc} + \epsilon_{gc'}$ is the sum of posterior variances of $\delta_{gc}$.
$\Delta_g$ is the “true” distance along the dimension of gene $g$.
$\alpha$ is a hyperparameter that controls the correlation between cells (0 = fully correlated, 2 = fully independent).

The function requires Sanity() to have been run before to estimate $\delta_{gc}$ and returns a dist object suitable for clustering or embedding.

Simulation Functions

Provides two functions to generate synthetic datasets for benchmarking using the generative processe described in the original paper:

simulate_independent_cells(): Simulates cells with independent gene expression profiles.
simulate_branched_random_walk(): Simulates cells with pseudo-temporal trajectories forming a tree.

sce_indep <- simulate_independent_cells(N_cell = 100, N_gene = 50)
sce_branch <- simulate_branched_random_walk(N_path = 20, length_path = 5, N_gene = 50)

Both functions return a SingleCellExperiment object.

Reference

Breda, J., Zavolan, M., & van Nimwegen, E. Bayesian inference of gene expression states from single-cell RNA-seq data. Nature Biotechnology, 39, 1008–1016 (2021). doi:10.1038/s41587-021-00875-x

Amezquita, R.A., Lun, A.T.L., Becht, E. et al. Orchestrating single-cell analysis with Bioconductor. Nature Methods 17, 137–145 (2020). doi:10.1038/s41592-019-0654-x

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
R		R
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SanityR

Installation

GitHub installation

Example

Main Functions

`Sanity()`

`calculateSanityDistance()`

Simulation Functions

Reference

About

Uh oh!

Releases

Packages

Languages

License

TeoSakel/SanityR

Folders and files

Latest commit

History

Repository files navigation

SanityR

Installation

GitHub installation

Example

Main Functions

Sanity()

calculateSanityDistance()

Simulation Functions

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Sanity()`

`calculateSanityDistance()`

Packages