🧬 Dynamic RNA velocity model-- (5) global time normalization

May 28, 2025·

Jiyuan (Jay) Liu

· 3 min read

Introduction

In scVelo paper, the first step to calculate the Gene-shared latent time, after inferring parameters of kinetic rates, is that gene-specific time points of well-fitted genes (with a likelihood of at least 0.1) are normalized to a common global (overall) time scale. Global time normalization is to address the challenge that different genes may have different intrinsic timescales for their kinetic processes. Without normalization, genes with faster kinetics would dominate the velocity field, while slower genes would contribute less to trajectory inference.

However, the method of Global Time Normalization is not described in the original scVelo paper. Its source codes latent_time() and compute_shared_time() shows a multi-step ad-hoc voting method to calculate a global latent time for the cells by considering the fitted latent times from multiple high-likelihood genes. Newer papers identify Global Time Normalization as a key weakness of the original scVelo implementation. Below are the alternative methods to calculate the global time normalization, which I want to try some time later to potentially address the relative scale of different genes and avoid the assumption of equal full cycle time for all genes.

Problem Statement

Each gene $g$ has its own characteristic timescale determined by its kinetic parameters:

Transcription rate: $\alpha_g$
Splicing rate: $\beta_g$
Degradation rate: $\gamma_g$

The raw latent time $t_{c,g}$ for cell $c$ and gene $g$ exists in gene-specific units, making cross-gene comparisons problematic.

Alternative 1

Step 1: Gene-Specific Characteristic Timescales

First, we need to identify the characteristic timescale for each gene. This is typically determined by the dominant kinetic process:

Splicing Timescale $$\tau_{\text{splice},g} = \frac{1}{\beta_g}$$

Degradation Timescale
$$\tau_{\text{degrade},g} = \frac{1}{\gamma_g}$$

Combined Characteristic Timescale

The effective timescale combines both processes:

$$\tau_g = \frac{1}{\beta_g + \gamma_g}$$

Alternatively, some implementations use:

$$\tau_g = \frac{1}{\max(\beta_g, \gamma_g)}$$

Step 2: Raw Latent Time Normalization

The raw latent time $t_{c,g}$ is normalized by the gene’s characteristic timescale:

$$\tilde{t}_{c,g} = \frac{t_{c,g}}{\tau_g}$$

This gives dimensionless, normalized latent times where:

$\tilde{t} = 0$ : Beginning of kinetic process
$\tilde{t} = 1$ : One characteristic time unit has passed

Step 3: Global Time Coordinate

To obtain a single global time coordinate for each cell, we aggregate across genes. Several approaches are used:

Method 1: Weighted Average

$$T_c = \frac{\sum_{g} w_g \cdot \tilde{t}_{c,g}}{\sum_{g} w_g}$$

where weights $w_g$ are typically based on gene reliability:

$$w_g = \text{fit\_likelihood}_g \cdot \mathbb{I}(\text{fit\_likelihood}_g > \theta)$$

with $\mathbb{I}$ being the indicator function and $\theta$ a threshold (e.g., 0.1).

Method 2: Median-Based Approach

$$T_c = \text{median}_g(\tilde{t}_{c,g})$$

This is more robust to outlier genes with extreme kinetic parameters.

Method 3: Principal Component Approach

For genes with high likelihood scores, compute the first principal component:

$$T_c = \text{PC1}(\{\tilde{t}_{c,g}\}_{g \in G_{\text{high}}})$$

where $G_{\text{high}} = \{g : \text{fit\_likelihood}_g > \theta\}$

Step 4: Final Normalization

The global time is often rescaled to $[0,1]$ interval:

$$T_c^{\text{norm}} = \frac{T_c - \min_c(T_c)}{\max_c(T_c) - \min_c(T_c)}$$

Alternative 2: Velocity-Based Normalization

We may normalize based on velocity magnitudes:

$$\sigma_g = \sqrt{\text{Var}(v_{c,g})}$$ $$\tilde{v}_{c,g} = \frac{v_{c,g}}{\sigma_g}$$

Then use normalized velocities to infer global time through:

$$T_c = \arg\max_{t} \sum_g \mathcal{L}(v_{c,g}(t) | \tilde{v}_{c,g})$$

Practical Considerations

Robust Estimation

To handle outliers in kinetic parameters:

$$\tau_g^{\text{robust}} = \text{median}_{g'} \left(\frac{1}{\beta_{g'} + \gamma_{g'}}\right) \cdot \frac{\beta_g + \gamma_g}{\text{median}_{g'}(\beta_{g'} + \gamma_{g'})}$$

Confidence Weighting

Incorporate parameter uncertainty:

$$w_g = \frac{\text{fit\_likelihood}_g}{\text{std}(\alpha_g, \beta_g, \gamma_g)} \cdot \mathbb{I}(\text{fit\_likelihood}_g > \theta)$$

Last updated on Jul 19, 2025