Querying the Deep Web
Authors
- Andrea Calì (University of Oxford, UK)
- Davide Martinenghi (Politecnico di Milano, Italy)
Abstract
Data stored outside Web pages and accessible from the Web, typically through HTML forms, constitute the so-called Deep Web. Such data are of great value, but difficult to query and search. We survey techniques to optimize query processing on the Deep Web, in a setting where data are represented in the relational model. We illustrate optimizations both at query plan generation time and at runtime, highlighting the role of integrity constraints. We discuss several prototype systems that address the query processing problem.
About the Authors
Andrea Calì (University of Oxford, UK)

Andrea Calì is Research Fellow at the Oxford-Man Institute of Quantitative Finance, University of Oxford; he also holds affiliations with the Oxford University Computing Laboratory and Keble College. Previously, he was Assistant Professor at the Department of Computer Science of the Free University of Bolzano, Italy. He holds a MSc in Electronic Engineering and a PhD in Computer Engineering, both from the University of Rome "La Sapienza". His research interests include database theory, information integration, web data extraction for economics, description logics, conceptual modelling, matchmaking in e-commerce, and mobile information systems. Andrea Calì has worked in several research projects, funded by the European Union and by the British and Italian Government; he also contributed to the design and development of several prototype information systems, among which: IBIS (data integration), DIS{@}DIS (data integration), IM3 (recommendation system for tourism), SmartDate (AI-based dating system). He worked as consultant for several IT companies.
Davide Martinenghi (Politecnico di Milano, Italy)

Davide Martinenghi received his MSc in Computer Engineering from Politecnico di Milano, Italy, in 1998, and his Ph.D. in Computer Science from Roskilde University, Denmark, in 2005. He is currently Assistant Professor at Politecnico di Milano. Previously, he was with the Department of Computer Science of the Free University of Bolzano, Italy. His main interests are data integrity maintenance, data integration, Web data access, and, in a broad sense, applications of logic to data management. He is also interested in research issues related to Web search, including the development of visual paradigms for the representation of queries over Web sources. He has participated in numerous research projects with funding from the European Union and the Italian Government.
Session
EDBT Tutorial: Querying the Deep Web (Friday, March 26, 14:00—17:30)

