If is a stationary ergodic process, then converges almost surely to The Glivenko–Cantelli theorem gives a stronger mode of convergence than this in the iid case.
For simplicity, consider a case of continuous random variable . Fix such that for . Now for all there exists such that .
Therefore,
Since by strong law of large numbers, we can guarantee that for any positive and any integer such that , we can find such that for all , we have . Combined with the above result, this further implies that , which is the definition of almost sure convergence.
Empirical measures
One can generalize the empirical distribution function by replacing the set by an arbitrary set C from a class of sets to obtain an empirical measure indexed by sets
where is the empirical measure, is the corresponding map, and
assuming that it exists.
Definitions
A class is called a Glivenko–Cantelli class (or GC class, or sometimes strong GC class) with respect to a probability measure P if
almost surely as
A class is a weak Glivenko-Cantelli class with respect to P if it instead satisfies the weaker condition
in probability as
A class is called a universal Glivenko–Cantelli class if it is a GC class with respect to any probability measure on .
A class is a weak uniform Glivenko–Cantelli class if the convergence occurs uniformly over all probability measures on : For every ,
as
A class is a (strong) uniform Glivenko-Cantelli class if it satisfies the stronger condition that for every ,
as
Glivenko–Cantelli classes of functions (as well as their uniform and universal forms) are defined similarly, replacing all instances of with .
The weak and strong versions of the various Glivenko-Cantelli properties often coincide under certain regularity conditions. The following definition commonly appears in such regularity conditions:
A class of functions is image-admissible Suslin if there exists a Suslin space and a surjection such that the map is measurable .
A class of measurable sets is image-admissible Suslin if the class of functions is image-admissible Suslin, where denotes the indicator function for the set .
Theorems
The following two theorems give sufficient conditions for the weak and strong versions of the Glivenko-Cantelli property to be equivalent.
Suppose that a function class is bounded. Also suppose that the set is image-admissible Suslin. Then is a weak uniform Glivenko-Cantelli class if and only if it is a strong uniform Glivenko-Cantelli class.
The following theorem is central to statistical learning of binary classification tasks.
Under certain consistency conditions, a universally measurable class of sets is a uniform Glivenko-Cantelli class if and only if it is a Vapnik–Chervonenkis class.
There exist a variety of consistency conditions for the equivalence of uniform Glivenko-Cantelli and Vapnik-Chervonenkis classes. In particular, either of the following conditions for a class suffice:[9]
is image-admissible Suslin.
is universally separable: There exists a countable subset of such that each set can be written as the pointwise limit of sets in .
Examples
Let and . The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by Kolmogorov's theorem,
, that is is uniformly Glivenko–Cantelli class.
Let P be a nonatomic probability measure on S and be a class of all finite subsets in S. Because , , , we have that and so is not a GC class with respect to P.
^Dudley, Richard M.; Giné, Eva; Zinn, Joel C. (1991). "Uniform and universal Glivenko-Cantelli classes". Journal of Theoretical Probability. 4: 485–510. doi:10.1007/BF01210321.
^Vapnik, V.N.; Chervonenkis, A.Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264–280. doi:10.1137/1116025.
^Pestov, Vladimir (2011). "PAC learnability versus VC dimension: A footnote to a basic result of statistical learning". The 2011 International Joint Conference on Neural Networks. pp. 1141–1145. arXiv:1104.2097. doi:10.1109/IJCNN.2011.6033352.
Pitman, E.J.G. (1979). "The Sample Distribution Function". Some Basic Theory for Statistical Inference. London, UK: Chapman and Hall. p. 79–97. ISBN0-470-26554-X.