Streaming in a Connected World: Querying and Tracking Distributed Data Streams
Authors
- Graham Cormode (AT&T Labs-Research, USA)
- Minos Garofalakis (Yahoo! Inc, USA)
Abstract
Large-scale event-monitoring systems require fast or continuous query answering in a world where the data is streaming and inherently distributed. The key challenge is to minimize both communication and processing burden while ensuring accuracy and timeliness of answers. We discuss example application domains, including sensor networks, network monitoring, and P2P networks. We also cover basic (centralized) data-streaming models and results, and outline the key dimensions of distributed data-streaming problems: (1) Querying Model: One-shot vs. continuous, exact vs. approximate, deterministic vs. randomized; (2) Communication Model: Single-level, hierarchical, or fullydistributed (e.g., DHT-based P2P systems), other communication constraints (e.g., network loss, intermittent connectivity); and, (3) Class of Queries: Holistic vs. non-holistic aggregates, duplicate sensitive vs. insensitive aggregates, more complex queries (e.g., inference models, set-valued results).
About the Authors
Graham Cormode (AT&T Labs-Research, USA)

Graham Cormode is a Principal Member of Technical Staff in the Database Management Group at AT&T Shannon Laboratories in New Jersey. Previously, he was a researcher at Bell Labs, after postdoctoral study at the DIMACS center in Rutgers University from 2002-2004. His PhD was granted by the University of Warwick in 2002. He works on data stream algorithms, large-scale data mining, and applied algorithms, with applications to databases, networks, and fundamentals of communications and computation.
Minos Garofalakis (Yahoo! Inc, USA)

Minos Garofalakis is a Principal Research Scientist with the Community Systems group at Yahoo! Research in Santa Clara, California, and an Adjunct Associate Professor of Computer Science at the University of California, Berkeley. Previously, he was a Senior Researcher at Intel Research Berkeley (2005-2007), and a Member of Technical Staff at Bell Laboratories (1998-2005). He obtained his PhD from the University of Wisconsin-Madison in 1998. His current research interests include data streaming, approximate query processing, probabilistic databases, network-data management, and XML databases.
Session
Tutorial: Streaming in a Connected World: Querying and Tracking Distributed Data Streams
Electronic Conference Proceedings