🧮 Math Derivation of a Top-Ranked scRNA-seq Imputation Method (SAVER)

Introduction

Single-cell RNASeq could have a large amount of zero values, representing either missing data or no expression. Imputation approaches to deal with this issue have the risk of generating false positive or irreproducible results. Model-based imputation generate fewer false-positives compared with data smoothing based methods (MAGIC and knn-smooth), but this varied greatly depending on how well the model described the datasets.

SAVER was the least likely to generate false or irreproducible results in a benchmark of common imputation methods. It suggests the mathematical model used in SAVER could well depict the structure inherent to the scRNAseq datasets. Here we will derive the Poisson–gamma mixture model (also known as negative binomial model) and its Bayesian framework used in SAVER that leverage conjugate priors to estimate the posterior distribution of gene expression levels.

Model Structure and Fitting

1. Observation Model

$$Y_{gc} \sim \text{Poisson}(s_c \lambda_{gc})$$

$Y_{gc}$ : Observed count for gene $g$ in cell $c$
$s_c$ : Size factor for cell $c$
$\lambda_{gc}$ : True (latent) expression level

2. Prior on Expression

$$\lambda_{gc} \sim \text{Gamma}(\alpha_{gc}, \beta_{gc})$$

This introduces a Gamma prior on the Poisson rate, forming a Poisson-Gamma compound model, which is equivalent to a Negative Binomial model for marginal counts.

A. Prior Mean Estimation

To estimate the prior mean $\mu_{gc}$ , the model uses a Poisson generalized linear model (GLM) with a log link and LASSO penalty:

$$\log E\left(\frac{s_c Y_{gc}}{Y_{g'c}}\right) = \log \mu_{gc} = \gamma_{g0} + \sum_{g' \neq g} \gamma_{gg'} \log\left[\frac{s_c Y_{g'c} + 1}{s_c}\right]$$

🔍 Explanation of Terms

$Y_{gc}$ : Observed expression count for gene $g$ in cell $c$ .
$s_c$ : Size factor for cell $c$ , used for normalization.
$\mu_{gc}$ : Prior mean expression for gene $g$ in cell $c$ , estimated from the model.
$\gamma_{g0}$ : Intercept term for gene $g$ .
$\gamma_{gg'}$ : Regression coefficient for the predictor gene $g'$ when modeling gene $g$ .
$\log\left[\frac{s_c Y_{g'c} + 1}{s_c}\right]$ : Log-normalized expression of gene $g'$ in cell $c$ , with a +1 pseudocount to avoid log(0).

📉 Loss Function

The penalized loss function minimized during fitting is derived later and is expressed as:

$$L(\gamma) = -\sum_c \left[Y_{gc} \log(\mu_{gc}) - \mu_{gc}\right] + \lambda \sum_{g' \neq g} |\gamma_{gg'}|$$

Where:

The first term is the Poisson negative log-likelihood.
The second term is the L1 penalty on the regression coefficients (excluding the intercept $\gamma_{g0}$ ).

⚙️ Fitting Procedure

Tool Used: glmnet R package (version 2.0–5)
Penalty: LASSO (L1 regularization)
Model Selection: The penalty parameter $\lambda$ is chosen via fivefold cross-validation, selecting the model with the lowest cross-validation error (see later for details).

🧠 Relation to the Bayes-like Technique

This model is used to predict the prior mean $\mu_{gc}$ for each gene in each cell.
The Poisson GLM with LASSO penalty selects a sparse set of predictor genes $g'$ that best explain the expression of gene $g$ , reflecting biological sparsity (i.e., gene interactions are limited).
The LASSO penalty (via the glmnet package) ensures that only a subset of genes with strong predictive power have nonzero coefficients $\gamma_{gg'}$ .
The predicted $\mu_{gc}$ from this model is then treated as the prior mean in a Bayesian framework, enabling shrinkage and denoising of gene expression estimates.

A.1 Derive Loss Function of the Poisson-LASSO Model

Step 1: Poisson Probability Mass Function

For a Poisson-distributed random variable $Y_c \sim \text{Poisson}(\lambda_c)$ , the probability of observing $Y_c = y_c$ is:

$$P(Y_c = y_c) = \frac{e^{-\lambda_c} \lambda_c^{y_c}}{y_c!}$$

Step 2: Log-Likelihood for All Observations

Assuming we have $C$ independent observations $\{Y_1, Y_2, \ldots, Y_C\}$ , the likelihood is the product of individual probabilities:

$$L(\lambda) = \prod_{c=1}^C \frac{e^{-\lambda_c} \lambda_c^{Y_c}}{Y_c!}$$

Taking the log of the likelihood:

$$\log L(\lambda) = \sum_{c=1}^C \left[-\lambda_c + Y_c \log(\lambda_c) - \log(Y_c!)\right]$$

Step 3: Negative Log-Likelihood (NLL)

The negative log-likelihood (which we minimize during model fitting) is:

$$\mathcal{L}(\lambda) = -\log L(\lambda) = \sum_{c=1}^C [\lambda_c - Y_c \log(\lambda_c) + \log(Y_c!)]$$

In Poisson regression, we model $\lambda_c = \exp(\eta_c)$ , where $\eta_c$ is a linear predictor (e.g., from LASSO regression). So:

$$\lambda_c = \exp\left(\gamma_0 + \sum_j \gamma_j x_{jc}\right)$$

Substituting into the NLL:

$$\mathcal{L}(\gamma) = \sum_{c=1}^C [\exp(\eta_c) - Y_c \eta_c + \log(Y_c!)]$$

The term $\log(Y_c!)$ is constant with respect to the parameters $\gamma$ , so it’s often omitted during optimization.

Adding LASSO Penalty

To enforce sparsity in the coefficients $\gamma$ , we add an L1 penalty:

$$\mathcal{L}_{\text{penalized}}(\gamma) = \sum_{c=1}^C [\exp(\eta_c) - Y_c \eta_c] + \lambda \sum_j |\gamma_j|$$

This is the Poisson LASSO loss function used in your model.

Key Components Summary

Observation Model: $Y_{gc} \sim \text{Poisson}(s_c \lambda_{gc})$
Size Factor: $s_c$ normalizes for library size differences
Expression Prior: $\lambda_{gc} \sim \text{Gamma}(\alpha_{gc}, \beta_{gc})$
GLM Framework: Uses log-link function with LASSO regularization
Compound Distribution: Poisson-Gamma yields Negative Binomial for overdispersed counts

A.2 LASSO Cross-Validation

What is the Penalty Parameter $\lambda$?

In LASSO regression, the penalty parameter $\lambda$ controls the strength of the L1 regularization:

Large $\lambda$ : More shrinkage → more coefficients set to zero → simpler model.
Small $\lambda$ : Less shrinkage → more coefficients retained → more complex model.

What is Fivefold Cross-Validation?

Cross-validation is a technique to evaluate how well a model generalizes to unseen data. In fivefold cross-validation:

The dataset is split into 5 equal parts (folds).
The model is trained on 4 folds and tested on the remaining fold.
This process is repeated 5 times, each time using a different fold as the test set.
The average error across the 5 test sets is computed.

How is $\lambda$ Chosen?

A range of $\lambda$ values is tested.
For each $\lambda$ , fivefold cross-validation is performed.
The cross-validation error (e.g., deviance or mean squared error) is computed for each $\lambda$ .
The $\lambda$ with the lowest average cross-validation error is selected.

This ensures that the chosen $\lambda$ balances model complexity and predictive accuracy, avoiding both underfitting and overfitting.

Why It Matters

In your Poisson LASSO regression model, this process ensures that the selected genes (nonzero coefficients) are reliable predictors of the target gene’s expression, based on how well they generalize across different subsets of the data.

Mathematical Framework

The LASSO objective function can be written as:

$$\min_{\beta} \left\{ \frac{1}{2n} \|y - X\beta\|_2^2 + \lambda \|\beta\|_1 \right\}$$

Where:

$y$ is the response vector
$X$ is the design matrix
$\beta$ is the coefficient vector
$\lambda$ is the penalty parameter
$\|\beta\|_1 = \sum_{j=1}^p |\beta_j|$ is the L1 norm

The cross-validation error for a given $\lambda$ is:

$$CV(\lambda) = \frac{1}{5} \sum_{k=1}^{5} L(y_k, \hat{y}_k(\lambda))$$

Where $L(y_k, \hat{y}_k(\lambda))$ is the loss function evaluated on the $k$ -th fold.

B. Prior Variance Estimation

🧠 Goal: Estimate Prior Variance for Gene $g$

We already estimated the prior mean $\mu_{gc}$ using Poisson LASSO regression. Now, we want to estimate the prior variance of the latent expression $\lambda_{gc}$ , which is modeled using a Gamma distribution:

$$\lambda_{gc} \sim \text{Gamma}(\alpha_{gc}, \beta_{gc})$$

The variance of a Gamma distribution is:

$$\text{Var}(\lambda_{gc}) = \frac{\alpha_{gc}}{\beta_{gc}^2}$$

🔧 Noise Models for Prior Variance

To estimate this variance, we assume a constant noise model across cells for each gene $g$ , controlled by a dispersion parameter $\phi_g$ . Three models are considered:

Constant Coefficient of Variation (CV):

$\phi_g^{cv}$

Implies constant shape parameter: $\alpha_{gc} = \alpha_g$

$$\text{CV} = \frac{\sqrt{\text{Var}(\lambda_{gc})}}{\text{E}[\lambda_{gc}]}$$

Constant Fano Factor:

$\phi_g^F$

Implies constant rate parameter: $\beta_{gc} = \beta_g$

$$\text{Fano} = \frac{\text{Var}(\lambda_{gc})}{\text{E}[\lambda_{gc}]}$$

Constant Variance:

$\phi_g^v$

Directly assumes $\text{Var}(\lambda_{gc}) = \phi_g$

📈 Model Selection via Marginal Likelihood

For each model:

Compute the marginal likelihood of the observed data across cells.
Choose the model with the highest maximum likelihood.
The corresponding $\phi_g$ is set to the maximum likelihood estimate $\hat{\phi}_g$ .

📌 Final Step: Compute Prior Variance $\hat{v}_{gc}$

Once the best model and $\hat{\phi}_g$ are selected, you can derive the prior variance $\hat{v}_{gc}$ for each gene-cell pair using the Gamma distribution formula, depending on the chosen model.

B.1 Noise Models

If $\lambda_{gc} \sim \text{Gamma}(\alpha_{gc}, \beta_{gc})$ , then:

Mean: $E[\lambda_{gc}] = \frac{\alpha_{gc}}{\beta_{gc}}$

Variance: $\text{Var}(\lambda_{gc}) = \frac{\alpha_{gc}}{\beta_{gc}^2}$

1. Constant Coefficient of Variation (CV)

Assumption: The coefficient of variation (CV) is constant across cells for gene $g$ . This implies a constant shape parameter: $\alpha_{gc} = \alpha_g$

Derivation: $$\text{CV} = \frac{\sqrt{\text{Var}(\lambda_{gc})}}{E[\lambda_{gc}]} = \frac{\sqrt{\frac{\alpha_g}{\beta_{gc}^2}}}{\frac{\alpha_g}{\beta_{gc}}} = \frac{1}{\sqrt{\alpha_g}}$$

So, under this model:

$\phi_{g}^{cv} = \text{CV}^2 = \frac{1}{\alpha_g}$

The rate parameter $\beta_{gc}$ varies with $\mu_{gc}$ , since $\mu_{gc} = \frac{\alpha_g}{\beta_{gc}} \Rightarrow \beta_{gc} = \frac{\alpha_g}{\mu_{gc}}$

2. Constant Fano Factor

Assumption: The Fano factor is constant across cells for gene $g$ . This implies a constant rate parameter: $\beta_{gc} = \beta_g$

Derivation: $$\text{Fano} = \frac{\text{Var}(\lambda_{gc})}{E[\lambda_{gc}]} = \frac{\frac{\alpha_{gc}}{\beta_g^2}}{\frac{\alpha_{gc}}{\beta_g}} = \frac{1}{\beta_g}$$

So, under this model:

$\phi_g^F = \text{Fano} = \frac{1}{\beta_g}$

The shape parameter $\alpha_{gc}$ varies with $\mu_{gc}$ , since $\mu_{gc} = \frac{\alpha_{gc}}{\beta_g} \Rightarrow \alpha_{gc} = \mu_{gc} \beta_g$

3. Constant Variance

Assumption: The variance of $\lambda_{gc}$ is constant across cells for gene $g$ .

$\text{Var}(\lambda_{gc}) = \phi_g^v$

Derivation: From the Gamma variance formula:

$\frac{\alpha_{gc}}{\beta_{gc}^2} = \phi_g^v$

This model allows both $\alpha_{gc}$ and $\beta_{gc}$ to vary with $\mu_{gc}$ , but constrains their relationship to maintain constant variance.

Summary Table

Model	Assumption	Fixed Parameter	Varying Parameter	Derived Relationship
Constant CV	$\text{CV} = \frac{1}{\sqrt{\alpha_g}}$	$\alpha_{gc} = \alpha_g$	$\beta_{gc} = \frac{\alpha_g}{\mu_{gc}}$	$\text{Var} = \mu_{gc}^2/\alpha_g$
Constant Fano Factor	$\text{Fano} = \frac{1}{\beta_g}$	$\beta_{gc} = \beta_g$	$\alpha_{gc} = \mu_{gc} \beta_g$	$\text{Var} = \mu_{gc}/\beta_g$
Constant Variance	$\text{Var} = \phi_g^v$	None	Both vary	$\frac{\alpha_{gc}}{\beta_{gc}^2} = \phi_g^v$

How $\hat{v}_{gc}$ is Computed

Depending on the model:

Model	Assumption	Formula for $\hat{v}_{gc} = \text{Var}(\lambda_{gc})$
Constant CV	$\alpha_{gc} = \alpha_g$	$\hat{v}_{gc} = \hat{\phi}_g \cdot \mu_{gc}^2$
Constant Fano Factor	$\beta_{gc} = \beta_g$	$\hat{v}_{gc} = \hat{\phi}_g \cdot \mu_{gc}$
Constant Variance	$\text{Var}(\lambda_{gc}) = \phi_g$	$\hat{v}_{gc} = \hat{\phi}_g$

So in all cases, $\hat{v}_{gc}$ is an estimate of the variance of the latent expression $\lambda_{gc}$ , derived from the Gamma prior and the selected noise model.

B.2 Marginal Likelihood for Model Selection

Poisson-Gamma to Negative Binomial: Step-by-Step Derivation

1. Observation Model

$$Y_{gc} \sim \text{Poisson}(s_c \lambda_{gc})$$

$Y_{gc}$ : Observed count for gene $g$ in cell $c$
$s_c$ : Size factor for cell $c$
$\lambda_{gc}$ : True (latent) expression level

2. Prior on Expression

$$\lambda_{gc} \sim \text{Gamma}(\alpha_{gc}, \beta_{gc})$$

This introduces a Gamma prior on the Poisson rate, forming a Poisson-Gamma compound model, which is equivalent to a Negative Binomial model for marginal counts.

To estimate the prior mean $\mu_{gc}$ , the model uses a Poisson generalized linear model (GLM) with a log link and LASSO penalty.

3. Poisson Likelihood

$$P(Y_{gc} \mid \lambda_{gc}) = \frac{(s_c \lambda_{gc})^{Y_{gc}} e^{-s_c \lambda_{gc}}}{Y_{gc}!}$$

4. Gamma Prior

$$P(\lambda_{gc}) = \frac{\beta_{gc}^{\alpha_{gc}}}{\Gamma(\alpha_{gc})} \lambda_{gc}^{\alpha_{gc}-1} e^{-\beta_{gc} \lambda_{gc}}$$

5. Marginal Distribution

Multiply and integrate:

$$P(Y_{gc}) = \int_0^{\infty} P(Y_{gc} \mid \lambda_{gc}) \cdot P(\lambda_{gc}) \, d\lambda_{gc}$$ $$= \frac{\beta_{gc}^{\alpha_{gc}} s_c^{Y_{gc}}}{\Gamma(\alpha_{gc}) Y_{gc}!} \int_0^{\infty} \lambda_{gc}^{Y_{gc} + \alpha_{gc} - 1} e^{-(\beta_{gc} + s_c) \lambda_{gc}} \, d\lambda_{gc}$$

This integral is the definition of the Gamma function:

$$\int_0^{\infty} x^{k-1} e^{-\theta x} dx = \frac{\Gamma(k)}{\theta^k}$$

So we get:

$$P(Y_{gc}) = \frac{\beta_{gc}^{\alpha_{gc}} s_c^{Y_{gc}}}{\Gamma(\alpha_{gc}) Y_{gc}!} \cdot \frac{\Gamma(Y_{gc} + \alpha_{gc})}{(\beta_{gc} + s_c)^{Y_{gc} + \alpha_{gc}}}$$

6. Final Simplified Form

Rewriting:

$$P(Y_{gc}) = \frac{\Gamma(Y_{gc} + \alpha_{gc})}{\Gamma(\alpha_{gc}) Y_{gc}!} \left(\frac{s_c}{\beta_{gc} + s_c}\right)^{Y_{gc}} \left(\frac{\beta_{gc}}{\beta_{gc} + s_c}\right)^{\alpha_{gc}}$$

This is the Negative Binomial distribution with parameters:

Shape parameter: $\alpha_{gc}$
Success probability: $p = \frac{\beta_{gc}}{\beta_{gc} + s_c}$
Mean: $\mathbb{E}[Y_{gc}] = \frac{s_c \alpha_{gc}}{\beta_{gc}}$
Variance: $\text{Var}[Y_{gc}] = \frac{s_c \alpha_{gc}}{\beta_{gc}} + \frac{s_c^2 \alpha_{gc}}{\beta_{gc}^2}$ (overdispersed relative to Poisson)

Result of the Integration

The symbolic integration yields:

$$P(Y_{gc}) = \frac{\Gamma(Y_{gc} + \alpha_{gc})}{\Gamma(\alpha_{gc}) Y_{gc}!} \left(\frac{s_c}{\beta_{gc} + s_c}\right)^{Y_{gc}} \left(\frac{\beta_{gc}}{\beta_{gc} + s_c}\right)^{\alpha_{gc}}$$

This is the probability mass function of a Negative Binomial distribution with:

Number of failures: $r = \alpha_{gc}$
Success probability: $p = \frac{\beta_{gc}}{\beta_{gc} + s_c}$

Summary

This proves that:

$$Y_{gc} \sim \text{Negative Binomial}\left(\alpha_{gc}, p = \frac{\beta_{gc}}{\beta_{gc} + s_c}\right)$$

So, the Poisson-Gamma compound model naturally leads to a Negative Binomial marginal distribution, which is why it’s widely used in modeling overdispersed count data like gene expression.

B.3 Fitting to decide noise model and variance expectation

When fitting the maximum marginal likelihood to estimate the dispersion parameter $\phi_g$ for a gene $g$ , here’s what is known and what is being estimated:

Known Quantities

These are available from earlier steps in the model:

Observed counts: $Y_{gc}$ for each cell $c$
Size factors: $s_c$ for each cell
Prior means: $\mu_{gc}$ , estimated from Poisson LASSO regression
Noise model: One of the three (constant CV, constant Fano factor, constant variance)

size factor

The size factor $s_c$ is a cell-specific scaling factor used to normalize raw gene expression counts. It accounts for differences in:

Sequencing depth (total number of reads per cell)
Capture efficiency (how well RNA was captured from each cell)

It ensures that expression levels are comparable across cells.Without normalization, cells with more reads would appear to have higher expression for all genes, simply due to technical artifacts—not biology.

There are several methods to estimate $s_c$ , but the general idea is:

1. Total Count Normalization (used in SAVER)

$$s_c = \frac{\text{total counts in cell } c}{\text{median total counts across all cells}}$$

2. Median Ratio Method (used in DESeq2)

Compute the geometric mean of each gene across all cells
For each cell, compute the ratio of each gene’s count to the gene’s geometric mean
The median of these ratios is the size factor $s_c$

3. Scran’s Deconvolution Method

Pools cells to estimate size factors more robustly in sparse data

🔍 Parameter to Estimate

The dispersion parameter $\phi_g$ for gene $g$

This is the only free parameter in the marginal likelihood optimization.

How the Fitting Works

For a given value of $\phi_g$ , use the selected noise model to derive the Gamma parameters $\alpha_{gc}$ , $\beta_{gc}$ for each cell.
Use these to compute the marginal likelihood of the observed counts $Y_{gc}$ under the Negative Binomial distribution.
Sum the log-likelihoods across all cells:

$$\log L(\phi_g) = \sum_c \log P(Y_{gc} \mid \mu_{gc}, \phi_g)$$

Optimize $\phi_g$ to maximize this log-likelihood.

Summary

Category	Items
Known	$Y_{gc}$ , $s_c$ , $\mu_{gc}$ , noise model
To Estimate	$\phi_g$ (dispersion parameter)
Derived	$\alpha_{gc}$ , $\beta_{gc}$ (from $\mu_{gc}$ and $\phi_g$ )
Objective	Maximize marginal likelihood $\log L(\phi_g)$

C. Full Derivation of the Bayesian Posterior of $λ_{gc}$

Model Setup

Likelihood:

$$ Y_{gc} \sim \text{Poisson}(\lambda_{gc} \cdot s_c) $$ $$ p(Y_{gc} \mid \lambda_{gc}, s_c) = \frac{(s_c \lambda_{gc})^{Y_{gc}}}{Y_{gc}!} e^{-s_c \lambda_{gc}} $$

Prior:

$$ \lambda_{gc} \sim \text{Gamma}(\hat{\alpha}_{gc}, \hat{\beta}_{gc}) $$

where the Gamma is in shape–rate form, i.e.,

$$ p(\lambda_{gc}) = \frac{\hat{\beta}_{gc}^{\hat{\alpha}_{gc}}}{\Gamma(\hat{\alpha}_{gc})} \lambda_{gc}^{\hat{\alpha}_{gc}-1} e^{-\hat{\beta}_{gc}\lambda_{gc}} $$

We want to compute the posterior:

$$ p(\lambda_{gc} \mid Y_{gc}, s_c, \hat{\alpha}_{gc}, \hat{\beta}_{gc}) $$

Simplified writting

If:

Prior: $\lambda \sim \text{Gamma}(\alpha, \beta)$

Likelihood: $Y \sim \text{Poisson}(s\lambda)$

Then we will prove that the posterior is:

$$\lambda | Y \sim \text{Gamma}(Y + \alpha, s + \beta)$$

Define the Distributions

Likelihood: $$p(Y|\lambda) = \frac{(s\lambda)^Y e^{-s\lambda}}{Y!}$$

Prior (Gamma): $$p(\lambda) = \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta\lambda}$$

Goal: Derive $p(\lambda|Y)$

We use Bayes’ Rule:

$$p(\lambda|Y) = \frac{p(Y|\lambda) \cdot p(\lambda)}{p(Y)}$$

Step 1: Multiply Numerator

$$p(Y|\lambda) \cdot p(\lambda) = \frac{(s\lambda)^Y e^{-s\lambda}}{Y!} \cdot \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta\lambda}$$

Group terms:

$$= \frac{s^Y \beta^\alpha}{Y! \Gamma(\alpha)} \lambda^{Y+\alpha-1} e^{-(s+\beta)\lambda}$$

Step 2: Marginal Likelihood (Evidence)

We want to compute the marginal probability of the observed data $Y$ , by integrating out $\lambda$ :

$$p(Y) = \int_0^\infty p(Y|\lambda)p(\lambda) \, d\lambda$$

Substitute the pdfs again:

$$p(Y) = \int_0^\infty \frac{(s\lambda)^Y e^{-s\lambda}}{Y!} \cdot \frac{\beta^\alpha}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta\lambda} d\lambda$$

Simplify constants:

$$p(Y) = \frac{s^Y \beta^\alpha}{Y! \Gamma(\alpha)} \int_0^\infty \lambda^{Y+\alpha-1} e^{-(s+\beta)\lambda} d\lambda$$

Recognize the integral as the definition of the Gamma function:

$$\int_0^\infty x^{k-1} e^{-\theta x} dx = \frac{\Gamma(k)}{\theta^k}$$

So,

$$\int_0^\infty \lambda^{Y+\alpha-1} e^{-(s+\beta)\lambda} d\lambda = \frac{\Gamma(Y+\alpha)}{(s+\beta)^{Y+\alpha}}$$

Thus, the marginal likelihood is:

$$p(Y) = \frac{s^Y \beta^\alpha}{Y! \Gamma(\alpha)} \cdot \frac{\Gamma(Y+\alpha)}{(s+\beta)^{Y+\alpha}}$$

Step 3: Combine Numerator and Denominator

$$p(\lambda|Y) = \frac{\frac{s^Y \beta^\alpha}{Y! \Gamma(\alpha)} \lambda^{Y+\alpha-1} e^{-(s+\beta)\lambda}}{\frac{s^Y \beta^\alpha}{Y! \Gamma(\alpha)} \cdot \frac{\Gamma(Y+\alpha)}{(s+\beta)^{Y+\alpha}}}$$

Cancel out constants:

$$p(\lambda|Y) = \frac{(s+\beta)^{Y+\alpha}}{\Gamma(Y+\alpha)} \cdot \lambda^{Y+\alpha-1} e^{-(s+\beta)\lambda}$$

Final Result: Posterior is Gamma

$$p(\lambda|Y) = \text{Gamma}(\lambda | Y+\alpha, \, s+\beta)$$

Where the general Gamma pdf is:

$$\text{Gamma}(\lambda|a,b) = \frac{b^a}{\Gamma(a)} \lambda^{a-1} e^{-b\lambda}$$

Conclusion

The above math derivation has achieved the goal of SAVER–

derive the posterior gamma distribution for $λ_{gc}$ given the observed counts $Y_{gc}$
use the posterior mean as the normalized SAVER $ \hat{\lambda}_{gc}$
use the variance in the posterior distribution can be thought of as a measure of uncertainty in the SAVER estimate.