🧬 Math derivation for steady-state RNA velocity model
Image credit: Logan Voss on UnsplashThe steady‑state model was the first to enable a mathematical estimation of RNA velocity, and most subsequent methods are modified versions of it or its generalization (the dynamic model in scVelo; see our other blogs). It has its limitations and a solid understanding of its underlying mathematics is needed to apply the model effectively. Here, we derive the steady-state model in scVelo and velocyto.
A. Estimating RNA velocity Using Steady-State Ratio in scVelo and velocyto
For steady-state models in scVelo and velocyto, RNA velocities are computed as deviations from this steady-state ratio:
where
$$
\gamma_0 = \frac{\beta}{\gamma}
$$
(i.e. the expected ratio of unspliced to spliced mRNA) is estimated analytically using least squares regression. Lower and upper quantiles in phase space, that is, where mRNA levels
reach minimum and maximum expression, respectively were assumed to be steady states and used for the estimation. Hence, the ratio can be approximated by a linear regression on these extreme quantiles.
Specifically, $ \gamma_0 $ is estimated as :
$$ \gamma_0 = \frac{\mathbf{u}^\top \mathbf{s}}{\lVert \mathbf{s} \rVert^2} $$This is a least-squares fit of the form:
$$ \mathbf{u} \approx \gamma_0 \cdot \mathbf{s} $$This means they model unspliced mRNA ($\mathbf{u}$ ) as being linearly dependent on spliced mRNA ($\mathbf{s}$ ), and solve for the slope $\gamma_0$ .
Symbol Definition
- $\mathbf{u} = (u_1, \ldots, u_n)$ : a vector of size-normalized unspliced mRNA counts for a single gene across a subset of cells.
- $\mathbf{s} = (s_1, \ldots, s_n)$ : a vector of corresponding spliced mRNA counts for that gene in the same subset of cells.
- $\mathbf{u}^\top \mathbf{s}$ : the dot product between unspliced and spliced vectors.
- $\lVert \mathbf{s} \rVert^2 = \mathbf{s}^\top \mathbf{s}$ : the squared norm of the spliced vector.
- $\gamma_0$ : the slope of the best-fit line through the origin relating $\mathbf{u}$ to $\mathbf{s}$ , i.e., the steady-state ratio.
B. Derive Least Squares Solution for Estimating the Steady-State Ratio $\gamma_0$
Let’s walk through the full derivation of the least squares solution for estimating the steady-state ratio $\gamma_0$ , which minimizes the squared difference between the unspliced and spliced counts scaled by $\gamma_0$ .
🔧 Problem Setup
Given:
- $u \in \mathbb{R}^n$ : vector of unspliced counts across $n$ cells
- $s \in \mathbb{R}^n$ : vector of spliced counts across the same $n$ cells
We model the unspliced counts as linearly proportional to spliced counts:
$$ u \approx \gamma_0 s $$We want to find $\gamma_0$ that minimizes the squared error:
$$ \min_{\gamma_0} \| u - \gamma_0 s \|^2 $$🧮 Expand the Norm
$$ \| u - \gamma_0 s \|^2 = (u - \gamma_0 s)^\top (u - \gamma_0 s) $$Expanding this expression:
$$ = u^\top u - 2\gamma_0 u^\top s + \gamma_0^2 s^\top s $$📉 Minimize with Respect to $\gamma_0$
Take derivative with respect to $\gamma_0$ and set it to zero:
$$ \frac{d}{d\gamma_0} \left( \| u - \gamma_0 s \|^2 \right) = -2 u^\top s + 2\gamma_0 s^\top s = 0 $$Solve for $\gamma_0$ :
$$ \gamma_0 = \frac{u^\top s}{\|s\|^2} $$Where:
- $u^\top s$ is the dot product
- $\|s\|^2 = s^\top s$ is the squared norm
✅ Interpretation
This is the slope of the best-fit line (through the origin) predicting unspliced from spliced counts. It’s equivalent to projecting $u$ onto $s$ in vector space.
In RNA velocity, this slope $\gamma_0$ serves as a reference ratio under the steady-state assumption:
Expected:
$$ u = \gamma_0 s $$Any deviation from this in other cells suggests that the gene is being upregulated or downregulated.
C. Extending the Steady-State Model with Offset
🧩 Original Model (No Offset)
Previously, we assumed:
$$u_i \approx \gamma_0 \cdot s_i$$But this forces the regression line to go through the origin, which isn’t always biologically realistic.
✅ Generalized Model (With Offset)
Now we model:
$$u_i \approx \gamma_0 \cdot s_i + o$$Where:
- $\gamma_0$ is still the slope (steady-state ratio)
- $o$ is a constant offset (intercept), modeling basal transcription
🔍 Least Squares Solution with Offset
To solve for both $\gamma_0$ and $o$ , we use ordinary least squares (OLS) for linear regression.
Given data $u=(u_1,\ldots,u_n)$ , $s=(s_1,\ldots,s_n)$ , we estimate:
$$\hat{u} = \gamma_0 s + o$$OLS gives:
Slope (Steady-State Ratio):
$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)}$$Where:
$$\text{Cov}(u,s) = \frac{1}{n}\sum_{i=1}^n (u_i-\bar{u})(s_i-\bar{s})$$ $$\text{Var}(s) = \frac{1}{n}\sum_{i=1}^n (s_i-\bar{s})^2$$Offset:
$$o = \bar{u} - \gamma_0\bar{s}$$This centers the regression line at the mean of the data points.
D. Derive Least Squares Solution with Intercept
Let’s derive the least squares solution with an intercept (offset) step by step. Our goal is to estimate both:
- $\gamma_0$ : the slope (steady-state ratio), and
- $o$ : the offset (basal transcription level)
given that we model:
$$u_i = \gamma_0 s_i + o + \varepsilon_i$$for each cell $i$ , where $\varepsilon_i$ is the residual.
🧩 1. Problem Setup
Let $u = [u_1, u_2, ..., u_n]^T$ , and $s = [s_1, s_2, ..., s_n]^T$
We want to solve for $\gamma_0$ and $o$ that minimize the squared residuals:
$$\min_{\gamma_0,o} \sum_{i=1}^n (u_i - \gamma_0 s_i - o)^2$$This is a linear least squares regression with intercept.
🧮 2. Matrix Form
Rewriting the model:
$$u = \gamma_0 s + o \cdot 1 + \varepsilon$$Let’s construct a design matrix $X$ :
$$X = \begin{bmatrix} s_1 & 1 \\ s_2 & 1 \\ \vdots & \vdots \\ s_n & 1 \end{bmatrix} \in \mathbb{R}^{n \times 2}$$Then the model becomes:
$$u = X \cdot \begin{bmatrix} \gamma_0 \\ o \end{bmatrix} + \varepsilon$$The least squares solution is given by:
$$\begin{bmatrix} \gamma_0 \\ o \end{bmatrix} = (X^T X)^{-1} X^T u$$We can compute this explicitly to get formulas for $\gamma_0$ and $o$ as below.
Slope:
$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} = \frac{\sum_{i=1}^n (u_i-\bar{u})(s_i-\bar{s})}{\sum_{i=1}^n (s_i-\bar{s})^2}$$Offset:
$$o = \bar{u} - \gamma_0\bar{s}$$These are the well-known results from simple linear regression that center the regression line at the mean of the data points.
🎯 3. Matrix Derivation of the Least Squares Solution
Let’s explicitly derive the solution:
$$\begin{bmatrix} \gamma_0 \\ o \end{bmatrix} = (X^\top X)^{-1} X^\top u$$for the model:
$$u = \gamma_0 s + o \cdot 1 + \varepsilon$$3.1 Define the Design Matrix and Vectors
Let’s say we have $n$ samples (cells), with:
- $s \in \mathbb{R}^n$ : spliced counts
- $u \in \mathbb{R}^n$ : unspliced counts
- $1 \in \mathbb{R}^n$ : vector of ones
We define the design matrix $X \in \mathbb{R}^{n \times 2}$ :
$$X = \begin{bmatrix} s_1 & 1 \\ s_2 & 1 \\ \vdots & \vdots \\ s_n & 1 \end{bmatrix} = [s \quad 1]$$We write the model:
$$u = X\theta + \varepsilon, \text{ with } \theta = \begin{bmatrix} \gamma_0 \\ o \end{bmatrix}$$We want to minimize the residual sum of squares:
$$\min_\theta \|u - X\theta\|^2$$3.2 Normal Equations
We derive the optimal $\theta$ by solving the normal equations:
$$X^\top X\theta = X^\top u$$So,
$$\theta = (X^\top X)^{-1} X^\top u$$Let’s compute each term step by step.
3.3 Compute $X^\top X$
$$X^\top X = \begin{bmatrix} \sum s_i^2 & \sum s_i \\ \sum s_i & n \end{bmatrix}$$Let’s denote:
- $S = \sum_{i=1}^n s_i$
- $S_2 = \sum_{i=1}^n s_i^2$
- $n$ : number of cells
So:
$$X^\top X = \begin{bmatrix} S_2 & S \\ S & n \end{bmatrix}$$3.4 Compute $X^\top u$
$$X^\top u = \begin{bmatrix} \sum s_i u_i \\ \sum u_i \end{bmatrix}$$Let’s denote:
- $U = \sum_{i=1}^n u_i$
- $SU = \sum_{i=1}^n s_i u_i$
So:
$$X^\top u = \begin{bmatrix} SU \\ U \end{bmatrix}$$3.5 Solve $\theta = (X^\top X)^{-1} X^\top u$
The inverse of a $2\times2$ matrix:
$$\begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac{1}{ad-bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$$So:
$$(X^\top X)^{-1} = \frac{1}{S_2n - S^2}\begin{bmatrix} n & -S \\ -S & S_2 \end{bmatrix}$$Now compute:
$$\theta = \frac{1}{S_2n - S^2}\begin{bmatrix} n & -S \\ -S & S_2 \end{bmatrix}\begin{bmatrix} SU \\ U \end{bmatrix}$$First row: $$n\cdot SU - S\cdot U$$
Second row: $$-S\cdot SU + S_2\cdot U$$
3.6 Final Expression
So we get:
$$\gamma_0 = \frac{n\cdot \sum s_i u_i - \sum s_i \cdot \sum u_i}{n\cdot \sum s_i^2 - (\sum s_i)^2} = \frac{\text{Cov}(u,s)}{\text{Var}(s)}$$ $$o = \bar{u} - \gamma_0\bar{s}$$This completes the derivation of the least squares solution with intercept, showing how to arrive at the covariance and variance expressions.
Why Normal Equations in 3.2 in Section D give the optimal solution
Let’s prove that the normal equations
$$X^\top X\theta = X^\top u$$give us the optimal solution in least squares regression.
Goal of Least Squares
We want to minimize the sum of squared errors (residuals):
$$L(\theta) = \|u-X\theta\|^2 = (u-X\theta)^\top(u-X\theta)$$This is a quadratic loss function. To minimize it, we take its gradient with respect to $\theta$ , set it to zero, and solve.
Step-by-step Derivation
Start with:
$$L(\theta) = (u-X\theta)^\top(u-X\theta)$$Expand the product:
$$L(\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$Now take the gradient with respect to $\theta$ :
$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta \tag{gradient} $$Set the gradient to zero:
$$-2X^\top u + 2X^\top X\theta = 0$$Divide both sides by 2:
$$X^\top X\theta = X^\top u$$This is the normal equation. When $X^\top X$ is invertible (which it is in our case since $X$ has full column rank), we can solve for $\theta$ :
$$\theta = (X^\top X)^{-1}X^\top u$$This solution minimizes the squared error because:
- The loss function is convex (it’s quadratic and $X^\top X$ is positive definite)
- We found where its gradient is zero
- The second derivative (Hessian) $2X^\top X$ is positive definite
Therefore, this solution gives us the global minimum of the least squares problem.
Derive the Gradient Equation above for the Least Squares Loss Function
Let’s walk through how we get:
$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta$$Step 1: Write the Loss Function
The least squares loss is:
$$L(\theta) = \|u-X\theta\|^2 = (u-X\theta)^\top(u-X\theta)$$Step 2: Expand the Quadratic Form
Let’s expand this expression:
$$L(\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$Here’s how we get this expansion:
$$(u-X\theta)^\top(u-X\theta) = u^\top u - 2\theta^\top X^\top u + \theta^\top X^\top X\theta$$This follows from two key matrix properties:
- $(AB)^T = B^T A^T$
- $u^\top X\theta = \theta^\top X^\top u$ (scalar equality)
Step 3: Take the Gradient with Respect to $\theta$
We differentiate term by term:
First term: $$\nabla_\theta(u^\top u) = 0$$ because it doesn’t depend on $\theta$
Second term: $$\nabla_\theta(-2\theta^\top X^\top u) = -2X^\top u$$ because it’s linear in $\theta$
Third term: $$\nabla_\theta(\theta^\top X^\top X\theta) = 2X^\top X\theta \tag{quadratic grad.}$$ This is a quadratic form, and its derivative is a standard result from matrix calculus.
Adding all parts together:
$$\nabla_\theta L = -2X^\top u + 2X^\top X\theta$$This gradient expression gives us the direction of steepest ascent of the loss function at any point $\theta$ . Setting it to zero leads to the normal equations, which give us the optimal solution.
Note: Vector Dot Product Commutativity
Please note the identity
$$a^\top b = b^\top a$$holds true. This is a foundational property in linear algebra, and the key point is:
Both $a^\top b$ and $b^\top a$ are scalars (i.e., single numbers), and they are equal because they compute the same dot product. Let’s break it down:
Let $a = [a_1, a_2, \ldots, a_n]^\top$ , and $b = [b_1, b_2, \ldots, b_n]^\top$ .
Then:
$$a^\top b = \sum_{i=1}^n a_i b_i$$ $$b^\top a = \sum_{i=1}^n b_i a_i$$But since real number multiplication is commutative (i.e., $a_i b_i = b_i a_i$ ), the two sums are identical:
$$a^\top b = b^\top a$$Note on shapes and intuition:
- $a^\top b$ is a $1\times1$ scalar (row vector × column vector)
- $b^\top a$ is also a $1\times1$ scalar (same logic)
- This is just the dot product: it doesn’t matter which order you take the dot product in. It’s symmetric. This property is crucial in many derivations in linear algebra and optimization, including our least squares derivation where we used $u^\top X\theta = \theta^\top X^\top u$ .
Since they’re both just numbers, and equal by commutativity, the identity holds.
Derive Quadratic Form Gradient in Equation (quadratic grad.)
Let’s prove the key gradient identity with Matrix Calculus:
$$\nabla_\theta(\theta^\top A\theta) = (A + A^\top)\theta$$And its special case when $A$ is symmetric:
$$\nabla_\theta(\theta^\top A\theta) = 2A\theta$$Setup
Given:
- $\theta \in \mathbb{R}^p$ is a column vector
- $A \in \mathbb{R}^{p \times p}$ is a constant matrix
- The scalar function is $f(\theta) = \theta^\top A\theta = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$
Overview of Steps
Step 1: Write in Summation Form
$$f(\theta) = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$Step 2: Compute Partial Derivatives
The gradient’s k-th component is:
$$\frac{\partial f}{\partial \theta_k} = \frac{\partial}{\partial \theta_k} \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$Step 3: Use Product Rule
When differentiating:
$$\frac{\partial}{\partial \theta_k}(\theta_i A_{ij} \theta_j) = A_{ij}(\delta_{ik}\theta_j + \theta_i\delta_{jk})$$where $\delta_{ik}$ is the Kronecker delta:
$$\delta_{ik} = \begin{cases} 1 & \text{if } i=k \\ 0 & \text{otherwise} \end{cases}$$Step 4: Sum Terms
$$\frac{\partial f}{\partial \theta_k} = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p \theta_i A_{ik}$$Step 5: Express in Vector Form
This gives us:
$$\frac{\partial f}{\partial \theta_k} = (A\theta)_k + (A^\top\theta)_k$$Therefore:
$$\nabla_\theta f = A\theta + A^\top\theta = (A + A^\top)\theta$$Symmetric Matrix: a special case
When $A = A^\top$ , we get:
$$\nabla_\theta(\theta^\top A\theta) = 2A\theta$$This is the form we use in least squares optimization where $A = X^\top X$ is symmetric.
Detail Proof of Key Steps above
Let’s prove that:
$$\frac{\partial}{\partial \theta_k} \left(\sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j\right) = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p \theta_i A_{ik}$$Step-by-step Derivation
Start with the function:
$$f(\theta) = \sum_{i=1}^p \sum_{j=1}^p \theta_i A_{ij} \theta_j$$Take partial derivative with respect to $\theta_k$ , using product rule for all $\theta_i$ and $\theta_j$ terms:
$$\frac{\partial f}{\partial \theta_k} = \sum_{i=1}^p \sum_{j=1}^p A_{ij} \cdot \frac{\partial}{\partial \theta_k}(\theta_i \theta_j)$$Now:
$$\frac{\partial}{\partial \theta_k}(\theta_i \theta_j) = \delta_{ik}\theta_j + \theta_i\delta_{jk}$$where $\delta_{ik}$ is the Kronecker delta:
$$\delta_{ik} = \begin{cases} 1 & \text{if } i=k \\ 0 & \text{otherwise} \end{cases}$$Therefore:
$$\frac{\partial f}{\partial \theta_k} = \sum_{i=1}^p \sum_{j=1}^p A_{ij}(\delta_{ik}\theta_j + \theta_i\delta_{jk})$$Split into two sums:
First term: $$\sum_{i=1}^p \sum_{j=1}^p A_{ij}\delta_{ik}\theta_j = \sum_{j=1}^p A_{kj}\theta_j$$
(since $\delta_{ik}=1$ only when $i=k$ )
Second term: $$\sum_{i=1}^p \sum_{j=1}^p A_{ij}\theta_i\delta_{jk} = \sum_{i=1}^p A_{ik}\theta_i$$
(since $\delta_{jk}=1$ only when $j=k$ )
Interim Result
Combining both terms:
$$\frac{\partial f}{\partial \theta_k} = \sum_{j=1}^p A_{kj}\theta_j + \sum_{i=1}^p A_{ik}\theta_i$$In vector form, this means:
$$\nabla_\theta(\theta^\top A\theta) = A\theta + A^\top\theta = (A + A^\top)\theta$$From Component-wise to Vector Form of Gradient
For each coordinate $k \in \{1,\ldots,p\}$ , we showed:
$$\frac{\partial f}{\partial \theta_k} = \underbrace{\sum_{j=1}^p A_{kj}\theta_j}_{(A\theta)_k} + \underbrace{\sum_{i=1}^p A_{ik}\theta_i}_{(A^\top\theta)_k}$$So each element of the gradient vector is:
$$(\nabla_\theta f)_k = (A\theta)_k + (A^\top\theta)_k$$Stacking into Vector Form
The full gradient vector is:
$$\nabla_\theta f = \begin{bmatrix} (A\theta)_1 + (A^\top\theta)_1 \\ (A\theta)_2 + (A^\top\theta)_2 \\ \vdots \\ (A\theta)_p + (A^\top\theta)_p \end{bmatrix} = A\theta + A^\top\theta$$ The notation $(A\theta)_k$ which refers to the k-th component of the matrix-vector product $A\theta$ . This follows because matrix-vector multiplication simply stacks the components:
$$(A\theta)_k = \sum_j A_{kj}\theta_j$$$$A\theta = \begin{bmatrix} \sum_{j=1}^p A_{1j}\theta_j \\ \sum_{j=1}^p A_{2j}\theta_j \\ \vdots \\ \sum_{j=1}^p A_{pj}\theta_j \end{bmatrix}$$ Similarly, for the transpose: $$(A^\top\theta)_k = \sum_i A_{ik}\theta_i$$
Final Result
Therefore:
$$\nabla_\theta(\theta^\top A\theta) = A\theta + A^\top\theta = (A + A^\top)\theta$$From Normal Equations to Standard Regression Form (3.6 in Section D)
Let’s derive how the normal equations solution:
$$\theta = \frac{1}{nS_2-S_1^2}\begin{bmatrix} n & -S_1 \\ -S_1 & S_2 \end{bmatrix}\begin{bmatrix} SU \\ U \end{bmatrix}$$relates to the standard regression form:
$$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} = \frac{n\sum s_i u_i - \sum s_i \sum u_i}{n\sum s_i^2 - (\sum s_i)^2}$$with offset:
$$o = \bar{u} - \gamma_0\bar{s}$$Given Matrices and Vectors
Let $s=[s_1,\ldots,s_n]^T$ , $u=[u_1,\ldots,u_n]^T$
Model: $$u_i = \gamma_0 s_i + o + \varepsilon_i \text{ or } u = \gamma_0 s + o\cdot 1 + \varepsilon$$
Matrix Form
Design matrix: $$X = [s \quad 1] \in \mathbb{R}^{n\times 2}$$
Parameters: $$\theta = \begin{bmatrix} \gamma_0 \\ o \end{bmatrix}$$
Computing Products
$$X^T X = \begin{bmatrix} \sum s_i^2 & \sum s_i \\ \sum s_i & n \end{bmatrix} = \begin{bmatrix} S_2 & S_1 \\ S_1 & n \end{bmatrix}$$ $$X^T u = \begin{bmatrix} \sum s_i u_i \\ \sum u_i \end{bmatrix} = \begin{bmatrix} SU \\ U \end{bmatrix}$$Solution Components
First row (slope): $$\gamma_0 = \frac{1}{nS_2-S_1^2}(n\cdot SU - S_1\cdot U)$$
Second row (intercept): $$o = \frac{1}{nS_2-S_1^2}(-S_1\cdot SU + S_2\cdot U)$$
Statistical Interpretation
Define: $$\bar{s} = \frac{1}{n}\sum s_i \quad \bar{u} = \frac{1}{n}\sum u_i$$
$$\text{Var}(s) = \frac{1}{n}\sum(s_i-\bar{s})^2 = \frac{1}{n}S_2 - \bar{s}^2$$ $$\text{Cov}(u,s) = \frac{1}{n}\sum(u_i-\bar{u})(s_i-\bar{s}) = \frac{1}{n}SU - \bar{u}\bar{s}$$Final Result
This gives us: $$\gamma_0 = \frac{\text{Cov}(u,s)}{\text{Var}(s)} \text{ and } o = \bar{u} - \gamma_0\bar{s}$$