Deriving Kendall's Tau from The Sample-Based Perspective (Discrete)

Mar 22, 2025·
Jiyuan (Jay) Liu
Jiyuan (Jay) Liu
· 3 min read

Kendall’s Tau ($\tau$) is a robust non-parametric measure of correlation that captures the strength of monotonic relationships between two variables. Unlike Pearson’s correlation, Kendall’s Tau is based on the relative ordering of data points rather than their exact values, making it particularly useful for ordinal data and resistant to outliers.

In this post, we’ll derive the complete formula for Kendall’s Tau step by step, starting from basic definitions and arriving at the elegant mathematical expression using the sign function.

Step 1: Foundation - Concordant and Discordant Pairs

For a dataset with $n$ paired observations $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$, we need to examine all possible pairs of observations.

Definitions

Consider any two distinct observations $(x_i, y_i)$ and $(x_j, y_j)$ where $i < j$. This pair can be classified as:

Concordant Pair: A pair where both variables change in the same direction

$$ (x_i - x_j)(y_i - y_j) > 0 $$

This occurs when:

  • Both $x_i > x_j$ and $y_i > y_j$ (both increase together), or
  • Both $x_i < x_j$ and $y_i < y_j$ (both decrease together)

Discordant Pair: A pair where the variables change in opposite directions

$$ (x_i - x_j)(y_i - y_j) < 0 $$

This occurs when:

  • $x_i > x_j$ but $y_i < y_j$ (one increases, other decreases), or
  • $x_i < x_j$ but $y_i > y_j$ (one decreases, other increases)

Counting the Pairs

Let:

  • $N_C$ = number of concordant pairs
  • $N_D$ = number of discordant pairs

The total number of possible pairs from $n$ observations is:

$$ \text{Total pairs} = \binom{n}{2} = \frac{n(n-1)}{2} $$

Initial Definition of Kendall’s Tau

Kendall’s Tau measures the excess of concordant over discordant pairs, normalized by the total number of pairs:

$$ \tau = \frac{N_C - N_D}{\text{total pairs}} = \frac{N_C - N_D}{\binom{n}{2}} = \frac{2(N_C - N_D)}{n(n-1)} $$

Step 2: The Sign Function Representation

We can express the classification of pairs more elegantly using the sign function. The sign function is defined as:

$$ \text{sgn}(x) = \begin{cases} +1 & \text{if } x > 0 \\ -1 & \text{if } x < 0 \\ 0 & \text{if } x = 0 \end{cases} $$

Key Insight

For any pair $(i,j)$ with $i < j$, the product of signs captures the pair’s nature:

$$ \text{sgn}(x_i - x_j) \cdot \text{sgn}(y_i - y_j) = \begin{cases} +1 & \text{if concordant} \\ -1 & \text{if discordant} \\ 0 & \text{if tie in } x \text{ or } y \end{cases} $$

Why this works:

  • Concordant: $(x_i - x_j)$ and $(y_i - y_j)$ have the same sign → $\text{sgn}(x_i - x_j) \cdot \text{sgn}(y_i - y_j) = (+1) \cdot (+1) = +1$ or $(-1) \cdot (-1) = +1$
  • Discordant: $(x_i - x_j)$ and $(y_i - y_j)$ have opposite signs → $\text{sgn}(x_i - x_j) \cdot \text{sgn}(y_i - y_j) = (+1) \cdot (-1) = -1$ or $(-1) \cdot (+1) = -1$
  • Tied: At least one difference is zero → product equals zero

Summation Formula

By summing over all pairs, we get:

$$ \sum_{iThis sum counts:

  • $+1$ for each concordant pair
  • $-1$ for each discordant pair
  • $0$ for each tied pair

Step 3: The Final Formula

Combining our results from Steps 1 and 2:

$$ \boxed{\tau = \frac{2}{n(n-1)} \sum_{iUnderstanding the Components

Normalization Factor: $\frac{2}{n(n-1)}$

  • Scales the result to the range $[-1, +1]$
  • Accounts for the total number of possible pairs

Sum: $\sum_{i

  • Computes the net difference between concordant and discordant pairs
  • Each pair contributes $+1$, $-1$, or $0$ to the sum

Interpretation

The value of Kendall’s Tau ranges from $-1$ to $+1$:

  • $\tau = +1$: Perfect positive association (all pairs concordant)
  • $\tau = -1$: Perfect negative association (all pairs discordant)
  • $\tau = 0$: No monotonic association (equal concordant and discordant pairs)

Advantages of This Formulation

  1. Computational Clarity: The sign function makes the algorithm explicit
  2. Theoretical Elegance: Connects to the fundamental concept of rank correlation
  3. Robustness: Naturally handles ties and is resistant to outliers
  4. Generalizability: Extends easily to partial correlations and other variants

Conclusion

The derivation of Kendall’s Tau demonstrates how a simple intuitive concept—comparing the relative ordering of pairs—leads to a mathematically elegant and computationally efficient formula. The sign function representation not only provides computational advantages but also offers deep insights into the nature of rank-based correlation measures.

This formula serves as the foundation for many applications in non-parametric statistics, from hypothesis testing to robust correlation analysis in the presence of outliers or non-linear relationships.