In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.
One also has the following inequality, due to Bretagnolle and Huber[2] (see also [3]), which has the advantage of providing a non-vacuous bound even when
The total variation distance is half of the L1 distance between the probability functions:
on discrete domains, this is the distance between the probability mass functions[4]
(or the analogous distance between Radon-Nikodym derivatives with any common dominating measure). This result can be shown by noticing that the supremum in the definition is achieved exactly at the set where one distribution dominates the other.[6]
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is , that is,
where the expectation is taken with respect to the probability measure on the space where lives, and the infimum is taken over all such with marginals and , respectively.[8]
^Bretagnolle, J.; Huber, C, Estimation des densités: risque minimax, Séminaire de Probabilités, XII (Univ. Strasbourg, Strasbourg, 1976/1977), pp. 342–363, Lecture Notes in Math., 649, Springer, Berlin, 1978, Lemma 2.1 (French).
^Tsybakov, Alexandre B., Introduction to nonparametric estimation, Revised and extended from the 2004 French original. Translated by Vladimir Zaiats. Springer Series in Statistics. Springer, New York, 2009. xii+214 pp. ISBN978-0-387-79051-0, Equation 2.25.
^David A. Levin, Yuval Peres, Elizabeth L. Wilmer, Markov Chains and Mixing Times, 2nd. rev. ed. (AMS, 2017), Proposition 4.2, p. 48.
^Tsybakov, Aleksandr B. (2009). Introduction to nonparametric estimation (rev. and extended version of the French Book ed.). New York, NY: Springer. Lemma 2.1. ISBN978-0-387-79051-0.