🧬 Math derivation for steady-state RNA velocity model

May 28, 2025·
Jiyuan (Jay) Liu
Jiyuan (Jay) Liu
· 13 min read
Image credit: Logan Voss on Unsplash

The steady‑state model was the first to enable a mathematical estimation of RNA velocity, and most subsequent methods are modified versions of it or its generalization (the dynamic model in scVelo; see our other blogs). It has its limitations and a solid understanding of its underlying mathematics is needed to apply the model effectively. Here, we derive the steady-state model in scVelo and velocyto.

A. Estimating RNA velocity Using Steady-State Ratio in scVelo and velocyto

For steady-state models in scVelo and velocyto, RNA velocities are computed as deviations from this steady-state ratio:

$$ \nu_i = u_i - \gamma_0 s_i $$

where $$ \gamma_0 = \frac{\beta}{\gamma} $$
(i.e. the expected ratio of unspliced to spliced mRNA) is estimated analytically using least squares regression. Lower and upper quantiles in phase space, that is, where mRNA levels reach minimum and maximum expression, respectively were assumed to be steady states and used for the estimation. Hence, the ratio can be approximated by a linear regression on these extreme quantiles.

Specifically, $ \gamma_0 $ is estimated as :

$$ \gamma_0 = \frac{\mathbf{u}^\top \mathbf{s}}{\lVert \mathbf{s} \rVert^2} $$

This is a least-squares fit of the form:

$$ \mathbf{u} \approx \gamma_0 \cdot \mathbf{s} $$

This means they model unspliced mRNA ($\mathbf{u}$ ) as being linearly dependent on spliced mRNA ($\mathbf{s}$ ), and solve for the slope $\gamma_0$ .


Symbol Definition

  • $\mathbf{u} = (u_1, \ldots, u_n)$ : a vector of size-normalized unspliced mRNA counts for a single gene across a subset of cells.
  • $\mathbf{s} = (s_1, \ldots, s_n)$ : a vector of corresponding spliced mRNA counts for that gene in the same subset of cells.
  • $\mathbf{u}^\top \mathbf{s}$ : the dot product between unspliced and spliced vectors.
  • $\lVert \mathbf{s} \rVert^2 = \mathbf{s}^\top \mathbf{s}$ : the squared norm of the spliced vector.
  • $\gamma_0$ : the slope of the best-fit line through the origin relating $\mathbf{u}$ to $\mathbf{s}$ , i.e., the steady-state ratio.

B. Derive Least Squares Solution for Estimating the Steady-State Ratio $\gamma_0$

Let’s walk through the full derivation of the least squares solution for estimating the steady-state ratio $\gamma_0$ , which minimizes the squared difference between the unspliced and spliced counts scaled by $\gamma_0$ .


🔧 Problem Setup

Given:

  • $u \in \mathbb{R}^n$ : vector of unspliced counts across $n$ cells
  • $s \in \mathbb{R}^n$ : vector of spliced counts across the same $n$ cells

We model the unspliced counts as linearly proportional to spliced counts:

$$ u \approx \gamma_0 s $$

We want to find $\gamma_0$ that minimizes the squared error:

$$ \min_{\gamma_0} \| u - \gamma_0 s \|^2 $$

🧮 Expand the Norm

$$ \| u - \gamma_0 s \|^2 = (u - \gamma_0 s)^\top (u - \gamma_0 s) $$

Expanding this expression:

$$ = u^\top u - 2\gamma_0 u^\top s + \gamma_0^2 s^\top s $$

📉 Minimize with Respect to $\gamma_0$

Take derivative with respect to $\gamma_0$ and set it to zero:

$$ \frac{d}{d\gamma_0} \left( \| u - \gamma_0 s \|^2 \right) = -2 u^\top s + 2\gamma_0 s^\top s = 0 $$

Solve for $\gamma_0$ :

$$ \gamma_0 = \frac{u^\top s}{\|s\|^2} $$

Where:

  • $u^\top s$ is the dot product
  • $\|s\|^2 = s^\top s$ is the squared norm

✅ Interpretation

This is the slope of the best-fit line (through the origin) predicting unspliced from spliced counts. It’s equivalent to projecting $u$ onto $s$ in vector space.

In RNA velocity, this slope $\gamma_0$ serves as a reference ratio under the steady-state assumption:

Expected:

$$ u = \gamma_0 s $$

Any deviation from this in other cells suggests that the gene is being upregulated or downregulated.

C. Extending the Steady-State Model with Offset

🧩 Original Model (No Offset)

Previously, we assumed:

$$u_i \approx \gamma_0 \cdot s_i$$

But this forces the regression line to go through the origin, which isn’t always biologically realistic.

✅ Generalized Model (With Offset)

Now we model:

$$u_i \approx \gamma_0 \cdot s_i + o$$

Where:

  • $\gamma_0$ is still the slope (steady-state ratio)
  • $o$ is a constant offset (intercept), modeling basal transcription

🔍 Least Squares Solution with Offset

To solve for both $\gamma_0$ and $o$ , we use ordinary least squares (OLS) for linear regression.

Given data $u=(u_1,\ldots,u_n)$ , $s=(s_1,\ldots,s_n)$ , we estimate:

$$\hat{u} = \gamma_0 s + o$$

OLS gives:

Slope (Steady-State Ratio):
$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)}$$

Where:

$$\text{Cov}(u,s) = \frac{1}{n}\sum_{i=1}^n (u_i-\bar{u})(s_i-\bar{s})$$ $$\text{Var}(s) = \frac{1}{n}\sum_{i=1}^n (s_i-\bar{s})^2$$
Offset:
$$o = \bar{u} - \gamma_0\bar{s}$$

This centers the regression line at the mean of the data points.

D. Derive Least Squares Solution with Intercept

Let’s derive the least squares solution with an intercept (offset) step by step. Our goal is to estimate both:

  • $\gamma_0$ : the slope (steady-state ratio), and
  • $o$ : the offset (basal transcription level)

given that we model:

$$u_i = \gamma_0 s_i + o + \varepsilon_i$$

for each cell $i$ , where $\varepsilon_i$ is the residual.

🧩 1. Problem Setup

Let $u = [u_1, u_2, ..., u_n]^T$ , and $s = [s_1, s_2, ..., s_n]^T$

We want to solve for $\gamma_0$ and $o$ that minimize the squared residuals:

$$\min_{\gamma_0,o} \sum_{i=1}^n (u_i - \gamma_0 s_i - o)^2$$

This is a linear least squares regression with intercept.

🧮 2. Matrix Form

Rewriting the model:

$$u = \gamma_0 s + o \cdot 1 + \varepsilon$$

Let’s construct a design matrix $X$ :

$$X = \begin{bmatrix} s_1 & 1 \\ s_2 & 1 \\ \vdots & \vdots \\ s_n & 1 \end{bmatrix} \in \mathbb{R}^{n \times 2}$$

Then the model becomes:

$$u = X \cdot \begin{bmatrix} \gamma_0 \\ o \end{bmatrix} + \varepsilon$$

The least squares solution is given by:

$$\begin{bmatrix} \gamma_0 \\ o \end{bmatrix} = (X^T X)^{-1} X^T u$$

We can compute this explicitly to get formulas for $\gamma_0$ and $o$ as below.

Slope:
$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} = \frac{\sum_{i=1}^n (u_i-\bar{u})(s_i-\bar{s})}{\sum_{i=1}^n (s_i-\bar{s})^2}$$
Offset:
$$o = \bar{u} - \gamma_0\bar{s}$$

These are the well-known results from simple linear regression that center the regression line at the mean of the data points.

🎯 3. Matrix Derivation of the Least Squares Solution

Let’s explicitly derive the solution:

$$\begin{bmatrix} \gamma_0 \\ o \end{bmatrix} = (X^\top X)^{-1} X^\top u$$

for the model:

$$u = \gamma_0 s + o \cdot 1 + \varepsilon$$
3.1 Define the Design Matrix and Vectors

Let’s say we have $n$ samples (cells), with:

  • $s \in \mathbb{R}^n$ : spliced counts
  • $u \in \mathbb{R}^n$ : unspliced counts
  • $1 \in \mathbb{R}^n$ : vector of ones

We define the design matrix $X \in \mathbb{R}^{n \times 2}$ :

$$X = \begin{bmatrix} s_1 & 1 \\ s_2 & 1 \\ \vdots & \vdots \\ s_n & 1 \end{bmatrix} = [s \quad 1]$$

We write the model:

$$u = X\theta + \varepsilon, \text{ with } \theta = \begin{bmatrix} \gamma_0 \\ o \end{bmatrix}$$

We want to minimize the residual sum of squares:

$$\min_\theta \|u - X\theta\|^2$$
3.2 Normal Equations

We derive the optimal $\theta$ by solving the normal equations:

$$X^\top X\theta = X^\top u$$

So,

$$\theta = (X^\top X)^{-1} X^\top u$$

Let’s compute each term step by step.

3.3 Compute $X^\top X$
$$X^\top X = \begin{bmatrix} \sum s_i^2 & \sum s_i \\ \sum s_i & n \end{bmatrix}$$

Let’s denote:

  • $S = \sum_{i=1}^n s_i$
  • $S_2 = \sum_{i=1}^n s_i^2$
  • $n$ : number of cells

So:

$$X^\top X = \begin{bmatrix} S_2 & S \\ S & n \end{bmatrix}$$
3.4 Compute $X^\top u$
$$X^\top u = \begin{bmatrix} \sum s_i u_i \\ \sum u_i \end{bmatrix}$$

Let’s denote:

  • $U = \sum_{i=1}^n u_i$
  • $SU = \sum_{i=1}^n s_i u_i$

So:

$$X^\top u = \begin{bmatrix} SU \\ U \end{bmatrix}$$
3.5 Solve $\theta = (X^\top X)^{-1} X^\top u$

The inverse of a $2\times2$ matrix:

$$\begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac{1}{ad-bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$$

So:

$$(X^\top X)^{-1} = \frac{1}{S_2n - S^2}\begin{bmatrix} n & -S \\ -S & S_2 \end{bmatrix}$$

Now compute:

$$\theta = \frac{1}{S_2n - S^2}\begin{bmatrix} n & -S \\ -S & S_2 \end{bmatrix}\begin{bmatrix} SU \\ U \end{bmatrix}$$

First row: $$n\cdot SU - S\cdot U$$

Second row: $$-S\cdot SU + S_2\cdot U$$

3.6 Final Expression

So we get:

$$\gamma_0 = \frac{n\cdot \sum s_i u_i - \sum s_i \cdot \sum u_i}{n\cdot \sum s_i^2 - (\sum s_i)^2} = \frac{\text{Cov}(u,s)}{\text{Var}(s)}$$ $$o = \bar{u} - \gamma_0\bar{s}$$

This completes the derivation of the least squares solution with intercept, showing how to arrive at the covariance and variance expressions.

Why Normal Equations in 3.2 in Section D give the optimal solution

Let’s prove that the normal equations

$$X^\top X\theta = X^\top u$$

give us the optimal solution in least squares regression.

Goal of Least Squares

We want to minimize the sum of squared errors (residuals):

$$L(\theta) = \|u-X\theta\|^2 = (u-X\theta)^\top(u-X\theta)$$

This is a quadratic loss function. To minimize it, we take its gradient with respect to $\theta$ , set it to zero, and solve.

Step-by-step Derivation

Start with:

$$L(\theta) = (u-X\theta)^\top(u-X\theta)$$

Expand the product:

$$L(\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$

Now take the gradient with respect to $\theta$ :

$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta \tag{gradient} $$

Set the gradient to zero:

$$-2X^\top u + 2X^\top X\theta = 0$$

Divide both sides by 2:

$$X^\top X\theta = X^\top u$$

This is the normal equation. When $X^\top X$ is invertible (which it is in our case since $X$ has full column rank), we can solve for $\theta$ :

$$\theta = (X^\top X)^{-1}X^\top u$$

This solution minimizes the squared error because:

  1. The loss function is convex (it’s quadratic and $X^\top X$ is positive definite)
  2. We found where its gradient is zero
  3. The second derivative (Hessian) $2X^\top X$ is positive definite

Therefore, this solution gives us the global minimum of the least squares problem.

Derive the Gradient Equation above for the Least Squares Loss Function

Let’s walk through how we get:

$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta$$
Step 1: Write the Loss Function

The least squares loss is:

$$L(\theta) = \|u-X\theta\|^2 = (u-X\theta)^\top(u-X\theta)$$
Step 2: Expand the Quadratic Form

Let’s expand this expression:

$$L(\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$

Here’s how we get this expansion:

$$(u-X\theta)^\top(u-X\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$

This follows from two key matrix properties:

  1. $(AB)^T = B^T A^T$
  2. $u^\top X\theta = \theta^\top X^\top u$ (scalar equality)
Step 3: Take the Gradient with Respect to $\theta$

We differentiate term by term:

  1. First term: $$\nabla_\theta(u^\top u) = 0$$ because it doesn’t depend on $\theta$

  2. Second term: $$\nabla_\theta(-2\theta^\top X^\top u) = -2X^\top u$$ because it’s linear in $\theta$

  3. Third term: $$\nabla_\theta(\theta^\top X^\top X\theta) = 2X^\top X\theta \tag{quadratic grad.}$$ This is a quadratic form, and its derivative is a standard result from matrix calculus.

Adding all parts together:

$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta$$

This gradient expression gives us the direction of steepest ascent of the loss function at any point $\theta$ . Setting it to zero leads to the normal equations, which give us the optimal solution.

Note: Vector Dot Product Commutativity

Please note the identity

$$a^\top b = b^\top a$$

holds true. This is a foundational property in linear algebra, and the key point is:

Both $a^\top b$ and $b^\top a$ are scalars (i.e., single numbers), and they are equal because they compute the same dot product. Let’s break it down:

Let $a = [a_1, a_2, \ldots, a_n]^\top$ , and $b = [b_1, b_2, \ldots, b_n]^\top$ .

Then:

$$a^\top b = \sum_{i=1}^n a_i b_i$$ $$b^\top a = \sum_{i=1}^n b_i a_i$$

But since real number multiplication is commutative (i.e., $a_i b_i = b_i a_i$ ), the two sums are identical:

$$a^\top b = b^\top a$$
Note on shapes and intuition:
  • $a^\top b$ is a $1\times1$ scalar (row vector × column vector)
  • $b^\top a$ is also a $1\times1$ scalar (same logic)
  • This is just the dot product: it doesn’t matter which order you take the dot product in. It’s symmetric. This property is crucial in many derivations in linear algebra and optimization, including our least squares derivation where we used $u^\top X\theta = \theta^\top X^\top u$ .

Since they’re both just numbers, and equal by commutativity, the identity holds.

Derive Quadratic Form Gradient in Equation (quadratic grad.)

Let’s prove the key gradient identity with Matrix Calculus:

$$\nabla_\theta(\theta^\top A\theta) = (A + A^\top)\theta$$

And its special case when $A$ is symmetric:

$$\nabla_\theta(\theta^\top A\theta) = 2A\theta$$

Setup

Given:

  • $\theta \in \mathbb{R}^p$ is a column vector
  • $A \in \mathbb{R}^{p \times p}$ is a constant matrix
  • The scalar function is $f(\theta) = \theta^\top A\theta = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$

Overview of Steps

Step 1: Write in Summation Form

$$f(\theta) = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$

Step 2: Compute Partial Derivatives

The gradient’s k-th component is:

$$\frac{\partial f}{\partial \theta_k} = \frac{\partial}{\partial \theta_k} \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$

Step 3: Use Product Rule

When differentiating:

$$\frac{\partial}{\partial \theta_k}(\theta_i A_{ij} \theta_j) = A_{ij}(\delta_{ik}\theta_j + \theta_i\delta_{jk})$$

where $\delta_{ik}$ is the Kronecker delta:

$$\delta_{ik} = \begin{cases} 1 & \text{if } i=k \\ 0 & \text{otherwise} \end{cases}$$

Step 4: Sum Terms

$$\frac{\partial f}{\partial \theta_k} = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p \theta_i A_{ik}$$

Step 5: Express in Vector Form

This gives us:

$$\frac{\partial f}{\partial \theta_k} = (A\theta)_k + (A^\top\theta)_k$$

Therefore:

$$\nabla_\theta f = A\theta + A^\top\theta = (A + A^\top)\theta$$

Symmetric Matrix: a special case

When $A = A^\top$ , we get:

$$\nabla_\theta(\theta^\top A\theta) = 2A\theta$$

This is the form we use in least squares optimization where $A = X^\top X$ is symmetric.

Detail Proof of Key Steps above

Let’s prove that:

$$\frac{\partial}{\partial \theta_k} \left(\sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j\right) = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p \theta_i A_{ik}$$
Step-by-step Derivation

Start with the function:

$$f(\theta) = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$

Take partial derivative with respect to $\theta_k$ , using product rule for all $\theta_i$ and $\theta_j$ terms:

$$\frac{\partial f}{\partial \theta_k} = \sum_{i=1}^p \sum_{j=1}^p A_{ij} \cdot \frac{\partial}{\partial \theta_k}(\theta_i \theta_j)$$

Now:

$$\frac{\partial}{\partial \theta_k}(\theta_i \theta_j) = \delta_{ik}\theta_j + \theta_i\delta_{jk}$$

where $\delta_{ik}$ is the Kronecker delta:

$$\delta_{ik} = \begin{cases} 1 & \text{if } i=k \\ 0 & \text{otherwise} \end{cases}$$

Therefore:

$$\frac{\partial f}{\partial \theta_k} = \sum_{i=1}^p \sum_{j=1}^p A_{ij}(\delta_{ik}\theta_j + \theta_i\delta_{jk})$$

Split into two sums:

First term: $$\sum_{i=1}^p \sum_{j=1}^p A_{ij}\delta_{ik}\theta_j = \sum_{j=1}^p A_{kj}\theta_j$$

(since $\delta_{ik}=1$ only when $i=k$ )

Second term: $$\sum_{i=1}^p \sum_{j=1}^p A_{ij}\theta_i\delta_{jk} = \sum_{i=1}^p A_{ik}\theta_i$$

(since $\delta_{jk}=1$ only when $j=k$ )

Interim Result

Combining both terms:

$$\frac{\partial f}{\partial \theta_k} = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p A_{ik}\theta_i$$

In vector form, this means:

$$\nabla_\theta(\theta^\top A\theta) = A\theta + A^\top\theta = (A + A^\top)\theta$$
From Component-wise to Vector Form of Gradient

For each coordinate $k \in \{1,\ldots,p\}$ , we showed:

$$\frac{\partial f}{\partial \theta_k} = \underbrace{\sum_{j=1}^p A_{kj}\theta_j}_{(A\theta)_k} + \underbrace{\sum_{i=1}^p A_{ik}\theta_i}_{(A^\top\theta)_k}$$

So each element of the gradient vector is:

$$(\nabla_\theta f)_k = (A\theta)_k + (A^\top\theta)_k$$
Stacking into Vector Form

The full gradient vector is:

$$\nabla_\theta f = \begin{bmatrix} (A\theta)_1 + (A^\top\theta)_1 \\ (A\theta)_2 + (A^\top\theta)_2 \\ \vdots \\ (A\theta)_p + (A^\top\theta)_p \end{bmatrix} = A\theta + A^\top\theta$$ The notation $(A\theta)_k$ which refers to the k-th component of the matrix-vector product $A\theta$ . This follows because matrix-vector multiplication simply stacks the components:

$$(A\theta)_k = \sum_j A_{kj}\theta_j$$

$$A\theta = \begin{bmatrix} \sum_{j=1}^p A_{1j}\theta_j \\ \sum_{j=1}^p A_{2j}\theta_j \\ \vdots \\ \sum_{j=1}^p A_{pj}\theta_j \end{bmatrix}$$ Similarly, for the transpose: $$(A^\top\theta)_k = \sum_i A_{ik}\theta_i$$

Final Result

Therefore:

$$\nabla_\theta(\theta^\top A\theta) = A\theta + A^\top\theta = (A + A^\top)\theta$$

From Normal Equations to Standard Regression Form (3.6 in Section D)

Let’s derive how the normal equations solution:

$$\theta = \frac{1}{nS_2-S_1^2}\begin{bmatrix} n & -S_1 \\ -S_1 & S_2 \end{bmatrix}\begin{bmatrix} SU \\ U \end{bmatrix}$$

relates to the standard regression form:

$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} = \frac{n\sum s_i u_i - \sum s_i \sum u_i}{n\sum s_i^2 - (\sum s_i)^2}$$

with offset:

$$o = \bar{u} - \gamma_0\bar{s}$$

Given Matrices and Vectors

Let $s=[s_1,\ldots,s_n]^T$ , $u=[u_1,\ldots,u_n]^T$

Model: $$u_i = \gamma_0 s_i + o + \varepsilon_i \text{ or } u = \gamma_0 s + o\cdot 1 + \varepsilon$$

Matrix Form

Design matrix: $$X = [s \quad 1] \in \mathbb{R}^{n\times 2}$$

Parameters: $$\theta = \begin{bmatrix} \gamma_0 \\ o \end{bmatrix}$$

Computing Products

$$X^T X = \begin{bmatrix} \sum s_i^2 & \sum s_i \\ \sum s_i & n \end{bmatrix} = \begin{bmatrix} S_2 & S_1 \\ S_1 & n \end{bmatrix}$$ $$X^T u = \begin{bmatrix} \sum s_i u_i \\ \sum u_i \end{bmatrix} = \begin{bmatrix} SU \\ U \end{bmatrix}$$

Solution Components

First row (slope): $$\gamma_0 = \frac{1}{nS_2-S_1^2}(n\cdot SU - S_1\cdot U)$$

Second row (intercept): $$o = \frac{1}{nS_2-S_1^2}(-S_1\cdot SU + S_2\cdot U)$$

Statistical Interpretation

Define: $$\bar{s} = \frac{1}{n}\sum s_i \quad \bar{u} = \frac{1}{n}\sum u_i$$

$$\text{Var}(s) = \frac{1}{n}\sum(s_i-\bar{s})^2 = \frac{1}{n}S_2 - \bar{s}^2$$ $$\text{Cov}(u,s) = \frac{1}{n}\sum(u_i-\bar{u})(s_i-\bar{s}) = \frac{1}{n}SU - \bar{u}\bar{s}$$

Final Result

This gives us: $$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} \text{ and } o = \bar{u} - \gamma_0\bar{s}$$