Centerpoint (geometry)In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space. Given a set of points in d-dimensional space, a centerpoint of the set is a point such that any hyperplane that goes through that point divides the set of points in two roughly equal subsets: the smaller part should have at least a 1/(d + 1) fraction of the points. Like the median, a centerpoint need not be one of the data points. Every non-empty set of points (with no duplicates) has at least one centerpoint. Related conceptsClosely related concepts are the Tukey depth of a point (the minimum number of sample points on one side of a hyperplane through the point) and a Tukey median of a point set (a point maximizing the Tukey depth). A centerpoint is a point of depth at least n/(d + 1), and a Tukey median must be a centerpoint, but not every centerpoint is a Tukey median. Both terms are named after John Tukey. For a different generalization of the median to higher dimensions, see geometric median. ExistenceA simple proof of the existence of a centerpoint may be obtained using Helly's theorem. Suppose there are n points, and consider the family of closed half-spaces that contain more than dn/(d + 1) of the points. Fewer than n/(d + 1) points are excluded from any one of these halfspaces, so the intersection of any subset of d + 1 of these halfspaces must be nonempty. By Helly's theorem, it follows that the intersection of all of these halfspaces must also be nonempty. Any point in this intersection is necessarily a centerpoint. AlgorithmsFor points in the Euclidean plane, a centerpoint may be constructed in linear time.[1] In any dimension d, a Tukey median (and therefore also a centerpoint) may be constructed in time O(nd − 1 + n log n).[2] A randomized algorithm that repeatedly replaces sets of d + 2 points by their Radon point can be used to compute an approximation to a centerpoint of any point set, in the sense that its Tukey depth is linear in the sample set size, in an amount of time that is polynomial in the dimension.[3][4] ReferencesCitationsSources
|