In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
A distance between populations can be interpreted as measuring the distance between two probability distributions. Statistical distance measures are not typically metrics, i.e., they may lack some properties w.r.t. a classical metric (e.g., they need not be symmetric).
Some types of distance measures, which generalize squared distance, are referred to as (statistical) divergences.
For more info: https://en.wikipedia.org/wiki/Statistical_distance
A similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects.
Overlapping can be defined as the area intersected by two or more probability density functions and offers a simple way to quantify the similarity (or difference) among samples or populations which are described in terms of distributions. Intuitively, two populations (or samples) are similar when their distribution functions overlap.
Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include deviance, deviation, discrepancy, discrimination, and divergence, as well as others such as contrast function and metric. Terms from information theory include cross entropy, relative entropy, discrimination information, and information gain.
Some important statistical distances include the following:
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy 相對熵 and I-divergence) (Kullback and Leibler, 1951), denoted , is a type of statistical distance: a measure of how one probability distribution is different from a second, reference probability distribution . While it is a distance, it is not a metric.
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance, see section below) is used to quantify the similarity between two probability distributions.
To define the Hellinger distance in terms of elementary probability theory, if we denote the densities as and , respectively, the squared Hellinger distance can be expressed as a standard calculus integral:
where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.
The Hellinger distance satisfies the property (derivable from the Cauchy–Schwarz inequality)
The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.
The maximum distance 1 is achieved when assigns probability zero to every set to which assigns a positive probability, and vice versa.
The Hellinger distance is related to the Bhattacharyya coefficient (see section below) as it can be defined as
The Hellinger distance can be specialized to several type of probability distributions.
In statistics, the Bhattacharyya distance (Bhattacharyya, 1943) measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations.
It is not a metric, despite named a “distance”, since it does not obey the triangle inequality.
Definition. For probability distributions and over the same domain , the Bhattacharyya distance is defined as
where
is the Bhattacharyya coefficient for discrete probability distributions. For continuous probability distribution, the Bhattacharyya coefficient is defined as
For more info: https://en.wikipedia.org/wiki/Bhattacharyya_distance