Project Award Number: IIS-0307072
Principal Investigator: Rada Y. Chirkova Main collaborator: Foto
Afrati
Department
of Computer Science
Computer Science Division
North
Carolina State University
National Technical University of Athens
Campus
Box 7535
9, Heroon Politechniou, Zographou
Raleigh,
NC 27695-7535 157 73 Athens, Greece
Phone:
(919) 513-3506 Phone: +30-1-7722498
Fax:
(919) 513-4357 Fax: +30-1-7722459
Email: chirkova@csc.ncsu.edu Email: afrati@cs.ece.ntua.gr
http://www4.ncsu.edu/~rychirko/ http://www.softlab.ntua.gr/facilities/public/AD/foto/
Keywords:
query
optimization, materialized views, query processing.
The
goal of this project is to develop new effective methods to improve the
performance of sets of frequent and important queries on large relational
databases, which could improve the efficiency of user interactions with
data-management systems. Solving the problem will have the most effect in query
optimization, data warehousing, and information integration. The project
focuses on the methodology of evaluating queries using views; views are
relations that are defined by auxiliary queries and can be used to rewrite and
answer user queries. One way to improve query performance is precompute and
store (i.e., materialize) views.
To truly optimize query
performance, it is critical to materialize the "right" views. The
project will demonstrate that, by designing and materializing views, it is
possible to ensure optimal or near-optimal performance of frequent and
important queries, for common and important query types. The focus of this
effort is on developing efficient and scalable heuristic algorithms that design
(near-) optimal sets of views for the given queries. The project has two parts:
(1) theoretical analysis and design of algorithms and heuristics for view
design, and (2) implementation and experiments on large databases, to evaluate
the performance improvements caused by using the views.
Publications and Products:
·
Foto
Afrati, Rada Chirkova, Shalu Gupta, and Charles Loftis. Designing and Using
Views to Improve Performance of Aggregate Queries. In review, submitted September 2004.
·
Rada
Chirkova, Shalu Gupta, Kyoung-Hwa Kim, and Simran Sandhu. Extensible Framework
for Query-Perofrmance Enhancement by Tuning. Code downloads and documentation available from http://research.csc.ncsu.edu/selftune/,
September 2004.
·
Foto
Afrati, Rada Chirkova, Shalu Gupta, and Charles Loftis. Designing and Using
Views to Improve Performance of Aggregate Queries, Technical Report NCSU CSC TR-2004-26, September 2004.
·
Foto
Afrati, Rada Chirkova, Manolis Gergatsoulis, and Vassia Pavlaki. Designing
Views to Answer Queries Under Set, Bag, and Bag-Set Semantics. In review, submitted September 2004.
·
Foto
Afrati and Rada Chirkova. Selecting and Using Views to Compute Aggregate
Queries. In review, submitted June
2004.
·
Rada
Chirkova, Chen Li, and Jia Li. Materializing Views with Minimum Size to Answer
Queries. Technical Report NCSU-UC Irvine,
May 2004.
·
Rada
Chirkova, Chen Li, and Jia Li. Answering Queries Using Materialized Views with
Minimum Size. In review, submitted May
2004.
·
Rada
Chirkova and Chen Li. Materializing Views with Minimal Size to Answer Queries. Proceedings
of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database
Systems (PODS-2003).
This project
offers a unique test case for a fundamental understanding of optimality in
answering queries and, more generally, in database performance costs (GPRA Performance Plan
Outcome Goal 1). The project has been providing educational and research
experience opportunities for graduate and undegraduate students (GPRA
Performance Plan Outcome Goals 3 and 4). We are planning to collaborate with
owners and users of large databases, to improve the efficiency of the queries
the users ask on their data, while at the same time testing the techniques
proposed in the project (GPRA Performance Plan Outcome Goal 2). The techniques
resulting from this project could have application in commercial and
experimental database systems, where they will provide new ways to lower
query-processing costs (GPRA Performance Plan Outcome Goal 2). The research
results are accessible via the project web site http://research.csc.ncsu.edu/selftune/,
publications, and freely disseminated software.
The project is
envisioned as a three-year effort and will consist of two parts: (1)
theoretical analysis and design of algorithms and heuristics for view design,
and (2) implementation and experiments on large databases, to evaluate the
performance improvements caused by using the views.
The research plan for the project is as
follows:
1. By August 2004 (done): to
characterize constructively optimal viewsets for the most common and important
types of queries. The main characteristic is the cost of computing the input
query workload using an optimal viewset on the input database.
2. By February 2005: to
delineate infeasibility areas, that is, those query types that present uncommon
challenges in designing optimal viewsets; and to characterize constructively
optimal viewsets for those common and important types of queries, in the
feasibility area, that have not been covered in Goal 1.
3. By August 2005: to design
algorithms, with one algorithm for each of the most common and important query
types in the feasibility area. Given appropriate problem inputs, each algorithm
will produce a viewset that is optimal, in the expected sense, for the given
input.
4. By February 2006: to design
efficient heuristic algorithms to approximate optimal viewsets for common and
important query types in the feasibility area; to devise recommendations on
designing potentially useful views for the areas of infeasibility.
5. By August 2006: testing and
evaluation: to obtain performance characterizations of the viewsets produced by
the approximation algorithms, see Goal 4, and to compare the characterizations
with (1) the performance characterizations of the optimal viewsets, and (2) the
performance improvements given, on the same inputs, by the well-known
view-selection algorithms described in the literature. The expected outcome of
this stage will comprise accuracy, performance, and scalability characteristics
of the algorithms. We expect our algorithms to give significant improvements
over the performance gains given by the algorithms described in the literature.
In a number of applications
of modern databases, many users pose complex queries on the data at the same
time. Processing numerous complex queries simultaneously and efficiently is a
nontrivial task for a database-management system; as a result, some or most
users may experience slower-than-desired response to their queries. At the same
time, if the database system receives some queries over and over again, it is
typically possible to significantly improve their response time, by
precomputing and storing in the database auxiliary data, called views,
and by using the views in the computation of the queries. For instance, in a
relational database where all data is stored in tables, a precomputed view
becomes just another stored table, which may be used, alongside the original
stored data, to answer queries on the database.
The best way to improve query performance is, of
course, to precompute and store the answers to all frequent and important
queries. Unfortunately, this trivial and optimal solution is often
unacceptable, because the new stored data has to satisfy pre-existing
constraints on the database system. A common constraint is the amount of disk
space allocated for storing the new data; large answers to complex queries
typically do not satisfy that constraint.
We explore the approach of
finding the “right” (optimal) views, that is, views that satisfy the given
database constraints and reduce, as much as possible, the response time for
most or all frequent and important database queries. This approach has a
potential to lead to dramatic improvements in the efficiency of user
interactions with many types of data-management systems. Solving the problem
will have the most effect in query optimization, data warehousing, and
information integration, which are important research topics with direct
practical applications. Moreover, our research program offers
a unique test case for a fundamental understanding of optimality in answering
queries and, more generally, in database performance.
Area References:
·
Rada
Chirkova, Alon Y. Halevy, and Dan Suciu. A Formal Perspective on the View
Selection Problem. The VLDB Journal (invited paper), 11(3):216-237,
2002.
·
Rada
Chirkova. The View-Selection Problem Has an Exponential-Time Lower Bound for
Conjunctive Queries and Views. Proceedings of the 21st ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2002).
·
Rada
Chirkova and Michael R. Genesereth. Linearly Bounded Reformulations of
Conjunctive Databases. Proceedings of the First International Conference on
Computational Logic, July 2000.
We welcome contacts from other researchers interested in collaboration.
http://research.csc.ncsu.edu/selftune/. This site has up-to-date information on the progress of the project, including an overview of the project (http://research.csc.ncsu.edu/selftune/manifestoFall2003.pdf), NSF annual reports (http://research.csc.ncsu.edu/selftune/reportAugust2003.html), technical reports and software to download.
http://research.csc.ncsu.edu/selftune/. This software a is a database-management system that includes aframework for generation and rewriting of derived data. The software has been built in an open-source database-management system PostgreSQL (http://www.postgresql.org). The purpose of the software is experimental validation of this and other research projects.