Statistical distances, distribution similarity, divergences, discrepancy

Statistical distance

In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.

A distance between populations can be interpreted as measuring the distance between two probability distributions. Statistical distance measures are not typically metrics, i.e., they may lack some properties w.r.t. a classical metric (e.g., they need not be symmetric).
Some types of distance measures, which generalize squared distance, are referred to as (statistical) divergences.

For more info: https://en.wikipedia.org/wiki/Statistical_distance

A similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects.

Overlapping can be defined as the area intersected by two or more probability density functions and offers a simple way to quantify the similarity (or difference) among samples or populations which are described in terms of distributions. Intuitively, two populations (or samples) are similar when their distribution functions overlap.

Terminology

Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include deviance, deviation, discrepancy, discrimination, and divergence, as well as others such as contrast function and metric. Terms from information theory include cross entropy, relative entropy, discrimination information, and information gain.

Examples

Some important statistical distances include the following:

f-divergence: includes
- Kullback–Leibler divergence
- Hellinger distance
Jensen–Shannon divergence and its square root, called Jensen-Shannon distance
Bhattacharyya distance
The Kolmogorov–Smirnov statistic represents a distance between two probability distributions defined on a single real variable
Mahalanobis distance

Kullback–Leibler divergence

In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy 相對熵 and I-divergence) (Kullback and Leibler, 1951), denoted

D_{KL} (P ∥ Q)

, is a type of statistical distance: a measure of how one probability distribution

P

is different from a second, reference probability distribution

Q

. While it is a distance, it is not a metric.

Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance, see section below) is used to quantify the similarity between two probability distributions.
To define the Hellinger distance in terms of elementary probability theory, if we denote the densities as

f

and

g

, respectively, the squared Hellinger distance can be expressed as a standard calculus integral:

H^{2} (f, g) = \frac{1}{2} \int {(\sqrt{f (x)} - \sqrt{g (x)})}^{2} d x = 1 - \int \sqrt{f (x) g (x)} d x,

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance

H (P, Q)

satisfies the property (derivable from the Cauchy–Schwarz inequality)

0 \leq H (P, Q) \leq 1.

Properties

The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.
The maximum distance 1 is achieved when

P

assigns probability zero to every set to which

Q

assigns a positive probability, and vice versa.

The Hellinger distance is related to the Bhattacharyya coefficient

BC (p, q)

(see section below) as it can be defined as

H (P, Q) = \sqrt{1 - BC (P, Q)} .

The Hellinger distance can be specialized to several type of probability distributions.

The Bhattacharyya distance

In statistics, the Bhattacharyya distance (Bhattacharyya, 1943) measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations.
It is not a metric, despite named a “distance”, since it does not obey the triangle inequality.

Definition. For probability distributions

p

and

q

over the same domain

X

, the Bhattacharyya distance is defined as

D_{B} (p, q) = - \ln (BC (p, q)),

where

BC (p, q) = \sum_{x \in X} \sqrt{p (x) q (x)}

is the Bhattacharyya coefficient for discrete probability distributions. For continuous probability distribution, the Bhattacharyya coefficient is defined as

BC (p, q) = \int \sqrt{p (x) q (x)} d x .

For more info: https://en.wikipedia.org/wiki/Bhattacharyya_distance

Statistical distances, distribution similarity, divergences, discrepancy

Statistical distance

Terminology

Examples

Kullback–Leibler divergence

Hellinger distance

Properties

The Bhattacharyya distance

Read more

Countering drifts

[Federated Learning on Riemannian Manifolds](https://arxiv.org/pdf/2206.05668.pdf)

Data drift and concept drift