📊 Bypassing Marginals Using Conjugate Priors in Bayesian Inference-- Math Derivation

May 27, 2025·
Jiyuan (Jay) Liu
Jiyuan (Jay) Liu
· 4 min read

Introduction

Conjugate priors are a powerful concept in Bayesian statistics that allow us to simplify the process of updating our beliefs about a parameter given new data. Using conjugate priors allows you to bypass computing marginal distributions because the posterior has the same functional form as the prior, making the normalization constant analytically tractable.

Standard Bayes’ Theorem

$$P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$$

where the marginal likelihood is:

$$P(D) = \int P(D|\theta)P(\theta)d\theta$$

The Problem

Computing $P(D)$ often requires intractable integrals, especially in high dimensions.

Conjugate Prior Solution

When the prior $P(\theta)$ is conjugate to the likelihood $P(D|\theta)$ , we have:

  1. Prior: $P(\theta) \propto f(\theta; \alpha_0)$ (some parametric form)
  2. Likelihood: $P(D|\theta) \propto g(\theta; D)$
  3. Posterior: $P(\theta|D) \propto f(\theta; \alpha_n)$ (same form as prior)

where $\alpha_n = h(\alpha_0, D)$ is a simple update function. Specifically, since we know the posterior belongs to the same family as the prior, we can write:

$$P(\theta|D) = \frac{f(\theta; \alpha_n)}{Z(\alpha_n)}$$

where $Z(\alpha_n)$ is the normalization constant for the known distribution family, which has a closed form.

Example: Beta-Binomial Conjugacy

  • Prior: $P(\theta) = \text{Beta}(\alpha, \beta)$
  • Likelihood: $P(D|\theta) = \text{Binomial}(n, \theta)$ with $s$ successes
  • Posterior: $P(\theta|D) = \text{Beta}(\alpha + s, \beta + n - s)$

We never need to compute:

$$P(D) = \int_0^1 \binom{n}{s}\theta^s(1-\theta)^{n-s} \cdot \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)} d\theta$$

Instead, we directly get the posterior parameters and use the known Beta normalization. This explains why conjugate priors are so powerful– they let us skip the hardest part of Bayesian computation. Let me break down what’s happening:

The Hard Way (Without Conjugate Priors)

To get the posterior, we’d normally need to compute this integral:

$$P(D) = \int_0^1 \binom{n}{s}\theta^s(1-\theta)^{n-s} \cdot \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)} d\theta$$

This integral combines:

  • $\binom{n}{s}\theta^s(1-\theta)^{n-s}$ (binomial likelihood)
  • $\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}$ (Beta prior)

Why This Integral is Nasty

Even though this particular integral has a closed form, in general such integrals:

  • May not have analytical solutions
  • Require numerical integration (expensive, approximate)
  • Get exponentially harder in higher dimensions

The Conjugate Prior Shortcut

Instead of computing that integral, we use the conjugate relationship:

  1. Recognize the pattern: $$\theta^s(1-\theta)^{n-s} \times \theta^{\alpha-1}(1-\theta)^{\beta-1} = \theta^{(s+\alpha)-1}(1-\theta)^{(n-s+\beta)-1} \quad (1)$$

    This looks like a Beta distribution with parameters $(\alpha + s, \beta + n - s)$

    We derive below in next section how this leads to the marginal likelihood:

    $$P(D) = \frac{B(\alpha+s, \beta+n-s)}{B(\alpha, \beta)} \binom{n}{s} \quad (2)$$
  2. Write the posterior directly:

    We substitute equation (1) and (2) in Standard Bayes’ Theorem to get the posterior distribution:

    $$P(\theta|D) = \text{Beta}(\alpha + s, \beta + n - s) = \frac{\theta^{\alpha+s-1}(1-\theta)^{\beta+n-s-1}}{B(\alpha+s, \beta+n-s)}$$

But for getting the posterior distribution, we don’t even need $P(D)$ - we just need the updated parameters!

Interim Summary

  • Without conjugacy: Solve a potentially intractable integral
  • With conjugacy: Simple parameter update: $(\alpha, \beta) \rightarrow (\alpha + s, \beta + n - s)$

The normalization “just works” because we’re staying within the same distributional family where normalization constants are known.

Math Derivation

Below is a step-by-step derivation of the marginal likelihood integral for the Beta-Binomial conjugate prior relationship. This shows how we can transform a potentially difficult integral into a simple ratio of Beta functions, which is the essence of why conjugate priors are so powerful in Bayesian inference.

Starting Point: The Marginal Likelihood Integral

$$P(D) = \int_0^1 \binom{n}{s}\theta^s(1-\theta)^{n-s} \cdot \frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)} d\theta \quad (1)$$

Step 1: Factor Out Constants

The binomial coefficient $\binom{n}{s}$ and $\frac{1}{B(\alpha,\beta)}$ don’t depend on $\theta$ , so we can pull them outside the integral:

$$P(D) = \binom{n}{s} \cdot \frac{1}{B(\alpha,\beta)} \int_0^1 \theta^s(1-\theta)^{n-s} \cdot \theta^{\alpha-1}(1-\theta)^{\beta-1} d\theta \quad (2)$$

Step 2: Combine the Powers

Using the exponent rule $x^a \cdot x^b = x^{a+b}$ :

$$P(D) = \binom{n}{s} \cdot \frac{1}{B(\alpha,\beta)} \int_0^1 \theta^{s+\alpha-1}(1-\theta)^{n-s+\beta-1} d\theta \quad (3)$$

Step 3: Recognize the Beta Function Integral

The integral $\int_0^1 \theta^{a-1}(1-\theta)^{b-1} d\theta$ is exactly the definition of the Beta function $B(a,b)$ .

In our case, we have:

  • $a = s + \alpha$
  • $b = n - s + \beta$

So: $$\int_0^1 \theta^{s+\alpha-1}(1-\theta)^{n-s+\beta-1} d\theta = B(s+\alpha, n-s+\beta) = B(\alpha+s, \beta+n-s) \quad (4)$$

Step 4: Substitute Back

$$P(D) = \binom{n}{s} \cdot \frac{1}{B(\alpha,\beta)} \cdot B(\alpha+s, \beta+n-s) \quad (5)$$

Final Result

Rearranging: $$P(D) = \frac{B(\alpha+s, \beta+n-s)}{B(\alpha,\beta)} \binom{n}{s} \quad (6)$$

Conclusion

This is why conjugate priors are so computationally elegant: they transform intractable integrals into simple ratios of known functions. Specifically for the Beta-Binomial case:

  1. We avoided numerical integration - instead of computing a potentially difficult integral, we used the known relationship between integrals and Beta functions
  2. Both Beta functions have closed forms - $B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$
  3. This gives us the exact marginal likelihood - no approximation needed