Project Award Number: IIS-0307072
Department
of Computer Science
North
Carolina State University
Campus
Box 7535
Raleigh,
NC 27695-7535
Phone:
(919) 513-3506
Fax:
(919) 513-4357
Email:
chirkova@csc.ncsu.edu
URL:
http://www4.ncsu.edu/~rychirko/
Keywords: query optimization,
materialized views, query processing.
The
goal of this project is to develop new effective methods to improve the
performance of sets of frequent and important queries on large relational
databases, which could improve the efficiency of user interactions with
data-management systems. Solving the problem will have the most effect in query
optimization, data warehousing, and information integration. The project
focuses on the methodology of evaluating queries using views; views are
relations that are defined by auxiliary queries and can be used to rewrite and
answer user queries. One way to improve query performance is precompute and
store (i.e., materialize) views.
To truly optimize query performance, it is critical
to materialize the "right" views. The project will demonstrate that,
by designing and materializing views, it is possible to ensure optimal or
near-optimal performance of frequent and important queries, for common and
important query types. The focus of this effort is on developing efficient and
scalable heuristic algorithms that design (near-) optimal sets of views for the
given queries. The project has two parts: (1) theoretical analysis and design
of algorithms and heuristics for view design, and (2) implementation and
experiments on large databases, to evaluate the performance improvements caused
by using the views.
Publications
and Products:
·
Rada
Chirkova and Chen Li. Materializing Views with Minimal Size to Answer Queries. Proceedings
of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database
Systems (PODS-2003).
·
Rada
Chirkova, Alon Y. Halevy, and Dan Suciu. A Formal Perspective on the View
Selection Problem. The VLDB Journal (invited paper), 11(3):216-237,
2002.
·
Rada
Chirkova. The View-Selection Problem Has an Exponential-Time Lower Bound for
Conjunctive Queries and Views. Proceedings of the 21st ACM
SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2002).
·
Rada
Chirkova and Michael R. Genesereth. Linearly Bounded Reformulations of Conjunctive
Databases. Proceedings of the First International Conference on Computational
Logic, July 2000.
This
project offers a unique test case for a fundamental understanding
of optimality in answering queries and, more generally, in database performance
costs (GPRA
Performance Plan Outcome Goal 1). The project will provide
educational and research experience opportunities for graduate and undegraduate
students (GPRA Performance Plan Outcome Goals 3 and 4). We anticipate
collaboration with owners and users of large databases, to improve the
efficiency of the queries the users ask on their data, while at the same time
testing the techniques proposed in the project (GPRA Performance Plan Outcome
Goal 2). For instance, we plan to collaborate with other departments - primarily
in the sciences - at North Carolina State University, with other academic
institutions and with companies in the private sector of the North Carolina
Research Triangle. The techniques resulting from this project could have
application in commercial and experimental database systems, where they will
provide new ways to lower query-processing costs (GPRA Performance Plan Outcome
Goal 2). The research results will be accessible via the project web site http://research.csc.ncsu.edu/selftune/,
publications, and freely disseminated software.
The
project is envisioned as a three-year effort and will consist of two parts: (1)
theoretical analysis and design of algorithms and heuristics for view design,
and (2) implementation and experiments on large databases, to evaluate the
performance improvements caused by using the views.
The research plan for the
project is as follows:
1.
By
August 2004: to characterize constructively optimal viewsets for the most
common and important types of queries. The main characteristic is the cost of
computing the input query workload using an optimal viewset on the input
database.
2.
By
February 2005: to delineate infeasibility areas, that is, those query types
that present uncommon challenges in designing optimal viewsets; and to
characterize constructively optimal viewsets for those common and important
types of queries, in the feasibility area, that have not been covered in Goal
1.
3.
By
August 2005: to design algorithms, with one algorithm for each of the most common
and important query types in the feasibility area. Given appropriate problem
inputs, each algorithm will produce a viewset that is optimal, in the expected
sense, for the given input.
4.
By
February 2006: to design efficient heuristic algorithms to approximate optimal
viewsets for common and important query types in the feasibility area; to
devise recommendations on designing potentially useful views for the areas of
infeasibility.
5.
By
August 2006: testing and evaluation: to obtain performance characterizations of
the viewsets produced by the approximation algorithms, see Goal 4, and to
compare the characterizations with (1) the performance characterizations of the
optimal viewsets, and (2) the performance improvements given, on the same
inputs, by the well-known view-selection algorithms described in the
literature. The expected outcome of this stage will comprise accuracy,
performance, and scalability characteristics of the algorithms. We expect our
algorithms to give significant improvements over the performance gains given by
the algorithms described in the literature.
In a number of applications of modern
databases, many users pose complex queries on the data at the same time.
Processing numerous complex queries simultaneously and efficiently is a
nontrivial task for a database-management system; as a result, some or most
users may experience slower-than-desired response to their queries. At the same
time, if the database system receives some queries over and over again, it is
typically possible to significantly improve their response time, by
precomputing and storing in the database auxiliary data, called views,
and by using the views in the computation of the queries. For instance, in a
relational database where all data is stored in tables, a precomputed view
becomes just another stored table, which may be used, alongside the original
stored data, to answer queries on the database.
The best way to improve query performance is, of course, to precompute
and store the answers to all frequent and important queries. Unfortunately,
this trivial and optimal solution is often unacceptable, because the new stored
data has to satisfy pre-existing constraints on the database system. A common
constraint is the amount of disk space allocated for storing the new data;
large answers to complex queries typically do not satisfy that constraint.
We explore the approach of finding the “right” (optimal) views, that is, views that satisfy the given database constraints and reduce, as much as possible, the response time for most or all frequent and important database queries. This approach has a potential to lead to dramatic improvements in the efficiency of user interactions with many types of data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. Moreover, our research program offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance.
http://research.csc.ncsu.edu/selftune/
(under construction)