Why Likelihood Ratio Statistics Follow Chi-Square Distributions: A Deep Dive

Jan 15, 2025·
Jiyuan (Jay) Liu
Jiyuan (Jay) Liu
· 5 min read

The likelihood ratio test is one of the most fundamental tools in statistical inference, and understanding why its test statistic follows a chi-square distribution is crucial for anyone working with hypothesis testing. In this post, we’ll explore this connection through rigorous mathematical foundations and intuitive explanations.

Foundation: The Chi-Square Distribution

Before diving into likelihood ratio statistics, let’s establish what makes a random variable chi-square distributed.

Definition

A random variable $Y$ has a chi-square distribution with $\nu$ degrees of freedom (denoted $Y \sim \chi^2_\nu$) if:

$$Y = \sum_{i=1}^{\nu} Z_i^2$$

where $Z_1, Z_2, \ldots, Z_\nu$ are independent and identically distributed standard normal random variables $N(0,1)$.

Fundamental Theorem

If $Z \sim N(0,1)$, then $X = Z^2$ has the chi-square distribution with 1 degree of freedom: $X \sim \chi^2_1$.

Proof: By definition, when $\nu = 1$:

$$Y = \sum_{i=1}^{1} Z_i^2 = Z_1^2$$

Let $Z = Z_1 \sim N(0,1)$. Then $Y = Z^2$, which by definition is $\chi^2_1$ distributed. □

For explicit verification, we can derive the probability density function. If $Z \sim N(0,1)$ and $X = Z^2$, then:

$$f_X(x) = \frac{1}{\sqrt{2\pi}} x^{-1/2} e^{-x/2}, \quad x > 0$$

This matches the standard chi-square PDF with 1 degree of freedom.

The Likelihood Ratio Statistic

Setup

Consider data $Y$ with likelihood function $L(\theta) = f(Y;\theta)$, where:

  • Parameter $\theta \in \Theta$ (parameter space)
  • Null hypothesis $H_0: \theta \in \Theta_0 \subset \Theta$

Definition

The likelihood ratio statistic is:

$$\Lambda = \frac{\sup_{\theta \in \Theta_0} L(\theta)}{\sup_{\theta \in \Theta} L(\theta)}$$

More commonly, we work with the log likelihood ratio:

$$\text{LR} = -2 \log \Lambda = -2 \left[ \log L(\hat{\theta}_0) - \log L(\hat{\theta}) \right]$$

where:

  • $\hat{\theta}_0$ maximizes $L(\theta)$ under $H_0$ (constrained MLE)
  • $\hat{\theta}$ maximizes $L(\theta)$ over $\Theta$ (unconstrained MLE)

Wilks’ Theorem: The Key Connection

The fundamental result connecting likelihood ratio statistics to chi-square distributions is Wilks’ theorem.

Theorem (Wilks, 1938)

Under regularity conditions, as the sample size $n \to \infty$:

$$-2 \log \Lambda \xrightarrow{d} \chi^2_k$$

where $k = \dim(\Theta) - \dim(\Theta_0)$ is the difference in the number of free parameters between the full and null models.

Regularity Conditions

The theorem requires several technical conditions:

  1. The true parameter is in the interior of the parameter space
  2. The likelihood function is twice differentiable
  3. Standard regularity conditions for MLE consistency and asymptotic normality
  4. The null hypothesis doesn’t place the true parameter on the boundary of the parameter space

Intuitive Understanding

Why does this work? The key insight lies in the quadratic approximation of the log-likelihood function.

Taylor Expansion Argument

Near the maximum likelihood estimate, the log-likelihood function can be approximated as:

$$\log L(\theta) \approx \log L(\hat{\theta}) - \frac{1}{2}(\theta - \hat{\theta})^T I(\hat{\theta}) (\theta - \hat{\theta})$$

where $I(\hat{\theta})$ is the observed Fisher information matrix.

The Connection

  1. The difference $\log L(\hat{\theta}) - \log L(\hat{\theta}_0)$ behaves like a quadratic form
  2. This quadratic form involves approximately normal variables (by the asymptotic normality of MLEs)
  3. A quadratic form in normal variables follows a chi-square distribution
  4. The degrees of freedom equal the dimensionality difference between the models

Practical Example

Let’s consider a concrete example to illustrate the theory.

One-Sample Mean Test

Suppose $Y_1, \ldots, Y_n \sim N(\mu, \sigma^2)$ with $\sigma^2$ known, and we want to test $H_0: \mu = \mu_0$.

Setup:

  • Full model: $\theta = \mu \in \mathbb{R}$ (1 parameter)
  • Null model: $\theta = \mu_0$ (0 free parameters)
  • Degrees of freedom: $k = 1 - 0 = 1$

Likelihood Ratio: The log-likelihood ratio statistic is:

$$-2 \log \Lambda = \frac{n(\bar{Y} - \mu_0)^2}{\sigma^2}$$

Distribution: Under $H_0$, this statistic follows $\chi^2_1$ as $n \to \infty$.

Connection to Familiar Tests: Note that $\frac{\sqrt{n}(\bar{Y} - \mu_0)}{\sigma} \sim N(0,1)$ under $H_0$, so:

$$\left(\frac{\sqrt{n}(\bar{Y} - \mu_0)}{\sigma}\right)^2 = \frac{n(\bar{Y} - \mu_0)^2}{\sigma^2} \sim \chi^2_1$$

This is exactly the square of the z-statistic, confirming our chi-square result!

Practical Implications

Hypothesis Testing

For testing $H_0$ against $H_1$ at significance level $\alpha$:

  • Compute the likelihood ratio statistic: $\text{LR} = -2 \log \Lambda$
  • Reject $H_0$ if $\text{LR} > \chi^2_{k,1-\alpha}$
  • The $p$-value is $P(\chi^2_k > \text{LR})$

Model Selection

Likelihood ratio tests are fundamental in:

  • Nested model comparisons
  • Goodness-of-fit testing
  • Parameter significance testing

Limitations

Remember that Wilks’ theorem is asymptotic:

  • For small samples, the chi-square approximation may be poor
  • Boundary cases require special treatment
  • Complex parameter spaces may violate regularity conditions

Understanding Degrees of Freedom: Why k = 1?

A common question arises: Why do we often see chi-square distributions with 1 degree of freedom in likelihood ratio tests?

The degrees of freedom of the resulting chi-square distribution is determined by the difference in the number of free parameters between the two models being compared:

  • Null Model ($H_0$): The simpler, more constrained model
  • Alternative Model ($H_1$): The more complex, less constrained model

If the difference in the number of parameters between the alternative model and the null model is exactly one, then the chi-square distribution will have 1 degree of freedom. This occurs in a common type of likelihood ratio test where you are testing a single parameter.

Regression Example

Suppose you are comparing two regression models:

  • $H_0$: $Y = \beta_0 + \beta_1 X_1$
  • $H_1$: $Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2$

In this scenario:

  • The null model has 2 parameters: $\beta_0, \beta_1$
  • The alternative model has 3 parameters: $\beta_0, \beta_1, \beta_2$
  • The null hypothesis is that $\beta_2 = 0$

When you perform a likelihood ratio test to compare these two models, the difference in the number of parameters is $3 - 2 = 1$. Consequently, the test statistic is asymptotically distributed as $\chi^2(1)$.

General Formula

$\text{df} = \dim(\Theta_1) - \dim(\Theta_0)$

where $\dim(\Theta_i)$ represents the number of free parameters in model $i$.

Key Takeaways

  1. Chi-square distributions arise naturally from sums of squared standard normal variables
  2. Likelihood ratio statistics are asymptotically chi-square due to the quadratic nature of log-likelihood surfaces
  3. Degrees of freedom equal the parameter difference between nested models
  4. Single parameter tests yield $\chi^2(1)$ distributions - the most common case
  5. Wilks’ theorem provides the theoretical foundation for many hypothesis tests
  6. The connection bridges parametric and non-parametric testing frameworks

Understanding this relationship deepens our appreciation of why so many statistical tests ultimately rely on chi-square distributions, from goodness-of-fit tests to analysis of variance and beyond.

Further Reading

  • Wilks, S.S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics
  • Lehmann, E.L. & Casella, G. (1998). Theory of Point Estimation
  • Van der Vaart, A.W. (1998). Asymptotic Statistics

Have questions about likelihood ratio tests or chi-square distributions? Feel free to reach out in the comments below!