Geometry in Action: Data Mining

Data Mining and Multidimensional Analysis

Data mining is the process of querying large databases (such as point-of-sale records) with the aim of distilling from them broad patterns and smaller collections of useful information. There seems to be little work in this area from the computational geometry perspective, but there are likely good geometric problems to be found in it. One such problem is coping with high-dimensional data, by condensing information down to a small number of relevant dimensions and applying geometric clustering techniques. Any algorithm to be used in this context must be fast, but it is perhaps more important to deal with amounts of data that do not fit in memory, and keep to a minimum the total number of I/O operations needed, as has been considered in recent work of Goodrich et al. ("External-memory computational geometry", 34th FOCS, 1993, 714-723). There are also interesting connections with geographic information systems, which face similar problems of querying large databases with more explicitly geometric content.

Case Studies in Biometry. This book by N. Lange and others mentions Voronoi diagrams as a method for detecting clusters of disease incidence.
Data mining research at Los Alamos.
Efficient data reduction with EASE and Deterministic sampling beyond EASE. Hervé Brönnimann and colleagues apply deterministic sampling techniques from computational geometry to data mining in large-scale data streams.
Near Neighbor Search in Large Metric Spaces, S. Brin, Stanford U.
Other Sites Relevent To Data Mining, Andy Pryke, Birmingham.
UCLA Data Mining Laboratory.