Streaming algorithms and compressed representation of data
- Deterministic sampling and range counting in geometric data streams.
A. Bagchi, A. Chaudhary, D. Eppstein, and M. T. Goodrich.
arXiv:cs.CG/0307027.
20th ACM Symp. Comp. Geom., Brooklyn, 2004, pp. 144–151.
ACM Trans. Algorithms 3(2):A16, 2007.We describe an efficient streaming-model construction of epsilon-nets and epsilon-approximations, and use it to find deterministic streaming-model approximation algorithms for iceberg range queries and for various robust statistics problems.
- Space-efficient straggler identification in round-trip data
streams via Newton's identities and invertible Bloom filters.
D. Eppstein, and M. T. Goodrich.
10th Worksh. Algorithms and Data Structures, Halifax, Nova Scotia, 2007.
Springer, Lecture Notes in Comp. Sci. 4619, 2007, pp. 637–648.
arXiv:0704.3313.
IEEE Trans. Knowledge and Data Engineering 23 (2): 297–306, 2011.We consider data structures for handling streams of check-in and check-out requests, and that must identify the set of check-ins that are not matched by a corresponding check-out. We show that, if this set has size k, it is possible to handle this problem in space O(k log n) by a deterministic data structure. However, if check-outs may occur that do not match any check-in, then no low-space deterministic solution is possible; instead, we provide a randomized solution with near-optimal space and high probability of answering queries correctly.
- What's the difference? Efficient set reconciliation without prior
context.
D. Eppstein, M. T. Goodrich, F. Uyeda, and G. Varghese.
Proc. ACM SIGCOMM, Toronto, Canada, 2011.We determine the symmetric difference between two similar sets of items, held by different machines on the internet, using an amount of communication bandwidth that is proportional only to the difference between the sets and with low computational overhead. Our solution technique combines the invertible Bloom filter data structure from our previous work on streaming straggler detection with a randomized sampling scheme that allows us to accurately and efficiently estimate the size of the difference.
- Set-difference range queries.
D. Eppstein, M. T. Goodrich, and J. Simons.
arXiv:1306.3482.
25th Canadian Conference on Computational Geometry, Waterloo, Canada, 2013.We show how to use invertible Bloom filters as part of range searching data structures that determine the differences between the members of two sets that lie in a given query range.
(Slides)