🧬 Dynamic RNA velocity model-- (7) Gillespie Stochastic Simulation Algorithm

May 28, 2025·
Jiyuan (Jay) Liu
Jiyuan (Jay) Liu
· 11 min read
Image credit: Logan Voss on Unsplash

Introduction

The dynamic RNA velocity model makes assumptions that are not always satisfied in real-world data, which helps explain why default pipelines often fail to capture true RNA velocity. Our previous six blog posts on effectively applying this model have equipped us to identify and thoughtfully fine-tune key (hyper)parameters and processing steps in scVelo, allowing for more accurate and meaningful analyses.

However, to rigorously validate these adjustments, we need high-quality simulated data with known ground truth. That’s why this seventh blog delves into the Gillespie Stochastic Simulation Algorithm (SSA)– a Monte Carlo method well suitable to simulate biochemical reaction systems where molecule numbers are low and stochastic effects are significant.

SSA is particularly well-suited for single-cell RNA-seq applications because it:

  • Models intrinsic noise at low molecule counts
  • Reflects realistic cell-to-cell variability
  • Handles complex, nonlinear networks
  • Captures rare, probabilistic events
  • Aligns with the biological reality of gene expression

In the context of RNA velocity, stochastic modeling:

  • Better estimate splicing rates and latent time
  • Model uncertainty in velocity vectors
  • Reflect non-smooth, noisy transitions across cell states

Specifically, this post illustrates Gillespie’s Stochastic Simulation Algorithm (SSA) through mathematical derivation, intuitive explanation, and concrete numerical examples– providing a solid foundation to grasp the algorithm and use its simulated datasets effectively in RNA velocity analysis.

Gillespie’s SSA for Cyclic Reactions

1. Theory: Gillespie’s SSA for Cyclic Reactions

System Setup

Consider a cyclic set of chemical reactions:

$$A \xrightarrow{k_1} B \xrightarrow{k_2} C \xrightarrow{k_3} A \tag{1}$$

where:

  • $k_1, k_2, k_3$ are rate constants
  • Molecule counts at time $t$ are $X_A(t), X_B(t), X_C(t)$

Propensity Functions

Each reaction $R_j$ has a propensity function $a_j(X)$ , which gives the instantaneous probability per unit time of that reaction firing, given state $X$ :

$$\begin{cases} a_1(X) = k_1 X_A \\ a_2(X) = k_2 X_B \\ a_3(X) = k_3 X_C \end{cases} \tag{2}$$

Gillespie’s SSA: Key Ideas

The system evolves as a continuous-time Markov jump process.

Time to next reaction is exponentially distributed with rate:

$$a_0 = a_1 + a_2 + a_3 \tag{3}$$

Probability that next reaction is reaction $j$ is:

$$\frac{a_j}{a_0} \tag{4}$$

Sampling Time to Next Reaction

The waiting time $\tau$ until the next reaction follows:

$$P(\tau > t) = e^{-a_0 t} \tag{5}$$

Sample $\tau$ by inverse transform sampling:

$$\tau = \frac{1}{a_0} \ln\left(\frac{1}{r_1}\right), \quad r_1 \sim U(0,1) \tag{6}$$

Sampling Which Reaction Fires

Choose reaction $R_j$ so that:

$$\sum_{i=1}^{j-1} a_i < r_2 a_0 \leq \sum_{i=1}^{j} a_i, \quad r_2 \sim U(0,1) \tag{7}$$

State Update

When reaction $R_j$ fires, update molecule counts:

$$\begin{align} X_A &\rightarrow X_A + \nu_{A,j} \tag{8a}\\ X_B &\rightarrow X_B + \nu_{B,j} \tag{8b}\\ X_C &\rightarrow X_C + \nu_{C,j} \tag{8c} \end{align}$$

where $\nu_{\cdot,j}$ is the stoichiometric change vector for reaction $j$ :

$$\begin{cases} R_1: (X_A, X_B, X_C) \rightarrow (X_A - 1, X_B + 1, X_C) \\ R_2: (X_A, X_B, X_C) \rightarrow (X_A, X_B - 1, X_C + 1) \\ R_3: (X_A, X_B, X_C) \rightarrow (X_A + 1, X_B, X_C - 1) \end{cases} \tag{9}$$

Interim Summary: Gillespie Algorithm for the Cycle

  1. Initialize $t = 0$ , state $X$
  2. Compute propensities $a_1, a_2, a_3$ and total $a_0$
  3. Sample $\tau = \frac{1}{a_0} \ln(1/r_1)$ , update time $t \leftarrow t + \tau$
  4. Sample $r_2$ , select reaction $j$
  5. Update state by stoichiometry of $R_j$
  6. Repeat until $t$ reaches max time or stopping condition

2. Why SSA Works for Cyclic Reactions

Markov Jump Process Foundation

The system’s state $X$ evolves randomly with jump rates given by $a_j(X)$ .

What is “Jump”?

In a stochastic chemical reaction system, the system’s state is defined by the numbers of molecules of each species.

A jump means the system changes state by a reaction firing.

For example, if reaction $R_1$ turns one molecule of A into one molecule of B, when $R_1$ fires, the state “jumps” from $(A,B,C)$ to $(A-1,B+1,C)$ .

What is a Jump Rate?

The jump rate is the instantaneous probability per unit time that a particular reaction will occur, causing a jump from the current state.

Formally, for reaction $R_j$ , the jump rate at state $X$ is the propensity function $a_j(X)$ .

$$\text{Jump rate for reaction } R_j = a_j(X) \tag{1}$$

This is sometimes called the reaction rate in stochastic modeling (not to be confused with deterministic rates).

Why “Jump”?

The system jumps from one discrete state (counts of molecules) to another when a reaction occurs.

Between jumps, the state is constant.

The jump rate governs the timing of these jumps probabilistically.

Example

For reaction $R_1: A \rightarrow B$ with rate constant $k_1$ , if you have $X_A$ molecules of A:

$$\text{Jump rate} = a_1(X) = k_1 \times X_A \tag{2}$$

This means:

The probability that $R_1$ fires in the next tiny time $dt$ is approximately $a_1(X)dt$ .

$$P(\text{reaction } R_1 \text{ fires in time } dt) \approx a_1(X) \cdot dt \tag{3}$$

Waiting time until next jump is exponentially distributed because the minimum of independent exponential waiting times (each reaction) is exponential with rate $a_0$ .

The probability the minimum corresponds to reaction $j$ is proportional to $a_j$ .

Intuition

Each reaction channel $j$ can be imagined as an independent Poisson process with rate $a_j$ .

The system “waits” for the first of these Poisson events.

Time until next reaction = minimum of all $\tau_j \sim \text{Exponential}(a_j)$ .

The minimum of exponentials is exponential with rate $a_0$ .

The probability reaction $j$ occurs is the chance $\tau_j$ is the minimum → proportional to $a_j$ .

This general logic holds regardless of network topology (including cycles).

3. Numeric Example: Step-by-Step Simulation

Parameters and Initial Conditions

ReactionRate $k$Initial Molecules
$R_1: A \rightarrow B$0.01$A = 100$
$R_2: B \rightarrow C$0.02$B = 0$
$R_3: C \rightarrow A$0.015$C = 0$

Step 1: Initial State

$t = 0$ , state $(A,B,C) = (100,0,0)$

Calculate reaction propensities:

$$a_1 = 0.01 \times 100 = 1.0 \tag{1}$$ $$a_2 = 0.02 \times 0 = 0 \tag{2}$$ $$a_3 = 0.015 \times 0 = 0 \tag{3}$$ $$a_0 = 1.0 \tag{4}$$

Sample random numbers:

$$r_1 = 0.5 \Rightarrow \tau = \frac{1}{1.0} \ln\left(\frac{1}{0.5}\right) = 0.693 \tag{5}$$ $$r_2 = 0.3 \Rightarrow r_2 a_0 = 0.3 \tag{6}$$

Since $0.3 < a_1 = 1.0$ , fire $R_1$ :

New state: $(A,B,C) = (99,1,0)$

Time $t = 0.693$

Step 2: Second Iteration

$t = 0.693$ , state $(99,1,0)$

Calculate reaction propensities:

$$a_1 = 0.01 \times 99 = 0.99 \tag{7}$$ $$a_2 = 0.02 \times 1 = 0.02 \tag{8}$$ $$a_3 = 0.015 \times 0 = 0 \tag{9}$$ $$a_0 = 1.01 \tag{10}$$

Sample random numbers:

$$r_1 = 0.1 \Rightarrow \tau = \frac{1}{1.01} \ln\left(\frac{1}{0.1}\right) \approx 2.28 \tag{11}$$ $$r_2 = 0.995 \Rightarrow r_2 a_0 = 1.00 \tag{12}$$

Cumulative propensities:

$$a_1 = 0.99 \tag{13}$$ $$a_1 + a_2 = 1.01 \tag{14}$$

Since $1.00$ is between $0.99$ and $1.01$ , fire $R_2$ :

New state: $(A,B,C) = (99,0,1)$

Time $t = 0.693 + 2.28 = 2.973$

Step 3: Third Iteration

$t = 2.973$ , state $(99,0,1)$

Calculate reaction propensities:

$$a_1 = 0.01 \times 99 = 0.99 \tag{15}$$ $$a_2 = 0.02 \times 0 = 0 \tag{16}$$ $$a_3 = 0.015 \times 1 = 0.015 \tag{17}$$ $$a_0 = 1.005 \tag{18}$$

Sample random numbers:

$$r_1 = 0.7 \Rightarrow \tau = \frac{1}{1.005} \ln\left(\frac{1}{0.7}\right) \approx 0.357 \tag{19}$$ $$r_2 = 0.98 \Rightarrow r_2 a_0 = 0.985 \tag{20}$$

Cumulative sums:

$$a_1 = 0.99 \tag{21}$$ $$a_1 + a_3 = 1.005 \tag{22}$$

Since $0.985 < 0.99$ , fire $R_1$ :

New state: $(A,B,C) = (98,1,1)$

Time $t = 2.973 + 0.357 = 3.33$

Step 4: Fourth Iteration

$t = 3.33$ , state $(98,1,1)$

Calculate reaction propensities:

$$a_1 = 0.01 \times 98 = 0.98 \tag{23}$$ $$a_2 = 0.02 \times 1 = 0.02 \tag{24}$$ $$a_3 = 0.015 \times 1 = 0.015 \tag{25}$$ $$a_0 = 1.015 \tag{26}$$

Sample random numbers:

$$r_1 = 0.2 \Rightarrow \tau = \frac{1}{1.015} \ln\left(\frac{1}{0.2}\right) \approx 1.59 \tag{27}$$ $$r_2 = 0.99 \Rightarrow r_2 a_0 = 1.004 \tag{28}$$

Cumulative sums:

$$a_1 = 0.98 \tag{29}$$ $$a_1 + a_2 = 1.00 \tag{30}$$ $$a_1 + a_2 + a_3 = 1.015 \tag{31}$$

Since $1.004$ is between $1.00$ and $1.015$ , fire $R_3$ :

New state: $(A,B,C) = (99,1,0)$

Time $t = 3.33 + 1.59 = 4.92$

Summary Table

StepTime $t$$A$$B$$C$Reaction Fired
10.6939910$R_1$
22.9739901$R_2$
33.339811$R_1$
44.929910$R_3$

Python Snippet to Simulate

import numpy as np

k = [0.01, 0.02, 0.015]
X = np.array([100, 0, 0])
t = 0.0
tmax = 10.0

while t < tmax:
    a = np.array([k[0]*X[0], k[1]*X[1], k[2]*X[2]])
    a0 = a.sum()
    if a0 == 0:
        break
    r1, r2 = np.random.rand(2)
    tau = (1/a0)*np.log(1/r1)
    t += tau
    cumsum = np.cumsum(a)
    reaction = np.searchsorted(cumsum, r2*a0)
    # Update state
    if reaction == 0:
        X += np.array([-1, +1, 0])
    elif reaction == 1:
        X += np.array([0, -1, +1])
    else:
        X += np.array([+1, 0, -1])
    print(f"t={t:.3f}, state={X}, reaction=R{reaction+1}")

Why No Events Before Time t in a Poisson Process Has Probability e^(-λt)

The Question

Why is the probability that no event occurs before time $t$ in a Poisson process:

$$P(\text{no event before } t) = e^{-\lambda t} \tag{1}$$

Let’s explain why this is true, both intuitively and mathematically.

Intuition: What is a Poisson Process?

A Poisson process models events that:

  • Happen randomly in time
  • Are independent of each other
  • Happen with a constant average rate $\lambda$ per unit time

We’re asking: What is the probability that no events occur in time interval $[0,t]$ ?

Mathematical Derivation

Fact: In a Poisson process with rate $\lambda$ , the number of events in time $t$ follows a Poisson distribution:

$$P(N(t) = k) = \frac{(\lambda t)^k}{k!} e^{-\lambda t} \tag{2}$$

So the probability of zero events by time $t$ is:

$$P(N(t) = 0) = \frac{(\lambda t)^0}{0!} e^{-\lambda t} = e^{-\lambda t} \tag{3}$$

That’s it! That’s where the formula comes from.

Interpretation

$e^{-\lambda t}$ is the probability that nothing happens in time $t$

This is also the probability that the first event happens after time $t$

Relationship to Waiting Time

Let $\tau$ be the waiting time to the first event. Then:

$$P(\tau > t) = P(\text{no event in } [0,t]) = e^{-\lambda t} \tag{4}$$

So the waiting time to the first event is exponentially distributed:

$$f_\tau(t) = \lambda e^{-\lambda t} \tag{5}$$

Real-World Analogy

Imagine a radioactive atom decaying. The chance that it survives without decaying for time $t$ is:

$$\text{Survival probability} = e^{-\lambda t} \tag{6}$$

This is why the exponential distribution is also used to model lifetimes and failure times.

How to sample τ from an Exponential Distribution

We want to sample the time $\tau$ until the next event, where $\tau \sim \text{Exponential}(a_0)$ , and $a_0$ is the total propensity (i.e. the total rate at which any reaction occurs).

The cumulative distribution function (CDF) of the exponential distribution is:

$$P(\tau \leq t) = 1 - e^{-a_0 t} \tag{7}$$

So the probability that τ is greater than t is:

$$P(\tau > t) = e^{-a_0 t} \tag{8}$$

This tells us: The longer we wait, the less likely it is that no reaction has occurred.

We use inverse transform sampling, which works like this:

Step 1: Generate a uniform random number:

$$r_1 \sim \text{Uniform}(0,1) \tag{9}$$

Step 2: Set it equal to the survival function:

$$r_1 = P(\tau > t) = e^{-a_0 t} \tag{10}$$

Step 3: Solve for $t$ :

$$\ln r_1 = -a_0 t \Rightarrow \tau = \frac{1}{a_0} \ln\left(\frac{1}{r_1}\right) \tag{11}$$

This gives a sample $\tau$ from the correct exponential distribution.

Why This Is Reasonable

  • The exponential distribution exactly describes the waiting time to the first event in a Poisson process (which is what Gillespie’s SSA models).
  • $r_1 \sim \text{Uniform}(0,1)$ is used to generate a random waiting time from that distribution.
  • The math guarantees that the generated $\tau$ has the right probability density:
$$f(\tau) = a_0 e^{-a_0 \tau} \tag{12}$$

Proof: Sampling Formula Gives Exponential Distribution

Goal

Prove that sampling:

$$\tau = \frac{1}{a_0} \ln\left(\frac{1}{r_1}\right) \text{ where } r_1 \sim \text{Uniform}(0,1) \tag{13}$$

gives a random variable that follows the exponential distribution with rate $a_0$ :

$$f_\tau(t) = a_0 e^{-a_0 t} \text{ for } t \geq 0 \tag{14}$$

Step 1: Start with the Transformation

Let:

$$\tau = -\frac{1}{a_0} \ln(r_1) \text{ where } r_1 \sim \text{Uniform}(0,1) \tag{15}$$

We’ll derive the probability density function (PDF) of $\tau$ using change of variables.

Step 2: Compute the CDF of τ

Let’s compute the cumulative distribution function $F_\tau(t)$ :

$$F_\tau(t) = P(\tau \leq t) = P\left(-\frac{1}{a_0} \ln(r_1) \leq t\right) \tag{16}$$ $$= P(\ln(r_1) \geq -a_0 t) = P(r_1 \geq e^{-a_0 t}) \tag{17}$$

Since $r_1 \sim \text{Uniform}(0,1)$ , we have:

$$P(r_1 \geq e^{-a_0 t}) = 1 - e^{-a_0 t} \tag{18}$$

So the CDF of $\tau$ is:

$$F_\tau(t) = 1 - e^{-a_0 t}, \quad t \geq 0 \tag{19}$$

Step 3: Differentiate to Get the PDF

Now differentiate the CDF to get the PDF:

$$f_\tau(t) = \frac{d}{dt} F_\tau(t) = \frac{d}{dt}(1 - e^{-a_0 t}) = a_0 e^{-a_0 t} \tag{20}$$

Which is exactly the exponential distribution with rate $a_0$ !

Conclusion

We’ve shown:

  1. The transformation $\tau = -\frac{1}{a_0} \ln(r_1)$ gives a variable with exponential distribution.

  2. This matches the required PDF:

$$f(\tau) = a_0 e^{-a_0 \tau} \tag{21}$$

Therefore, Gillespie’s formula for sampling waiting time is mathematically justified.