Efficient View-Design Algorithms for Near-Optimal Query Performance

Project Award Number: IIS-0307072


Principal Investigator:
Rada Y. Chirkova

Department of Computer Science

North Carolina State University

Campus Box 7535

Raleigh, NC 27695-7535

Phone: (919) 513-3506

Fax: (919) 513-4357

Email: chirkova@csc.ncsu.edu

URL: http://www4.ncsu.edu/~rychirko/


Keywords: query optimization, materialized views, query processing.

 

Project Summary:

The goal of this project is to develop new effective methods to improve the performance of sets of frequent and important queries on large relational databases, which could improve the efficiency of user interactions with data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration. The project focuses on the methodology of evaluating queries using views; views are relations that are defined by auxiliary queries and can be used to rewrite and answer user queries. One way to improve query performance is precompute and store (i.e., materialize) views.

To truly optimize query performance, it is critical to materialize the "right" views. The project will demonstrate that, by designing and materializing views, it is possible to ensure optimal or near-optimal performance of frequent and important queries, for common and important query types. The focus of this effort is on developing efficient and scalable heuristic algorithms that design (near-) optimal sets of views for the given queries. The project has two parts: (1) theoretical analysis and design of algorithms and heuristics for view design, and (2) implementation and experiments on large databases, to evaluate the performance improvements caused by using the views.

 

Publications and Products:

·              Rada Chirkova and Chen Li. Materializing Views with Minimal Size to Answer Queries. Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2003).

·              Rada Chirkova, Alon Y. Halevy, and Dan Suciu. A Formal Perspective on the View Selection Problem. The VLDB Journal (invited paper), 11(3):216-237, 2002.

·              Rada Chirkova. The View-Selection Problem Has an Exponential-Time Lower Bound for Conjunctive Queries and Views. Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2002).

·              Rada Chirkova and Michael R. Genesereth. Linearly Bounded Reformulations of Conjunctive Databases. Proceedings of the First International Conference on Computational Logic, July 2000.

Project Impact:

This project offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance costs (GPRA Performance Plan Outcome Goal 1). The project will provide educational and research experience opportunities for graduate and undegraduate students (GPRA Performance Plan Outcome Goals 3 and 4). We anticipate collaboration with owners and users of large databases, to improve the efficiency of the queries the users ask on their data, while at the same time testing the techniques proposed in the project (GPRA Performance Plan Outcome Goal 2). For instance, we plan to collaborate with other departments - primarily in the sciences - at North Carolina State University, with other academic institutions and with companies in the private sector of the North Carolina Research Triangle. The techniques resulting from this project could have application in commercial and experimental database systems, where they will provide new ways to lower query-processing costs (GPRA Performance Plan Outcome Goal 2). The research results will be accessible via the project web site http://research.csc.ncsu.edu/selftune/, publications, and freely disseminated software.

 

Goals, Objectives and Targeted Activities:

The project is envisioned as a three-year effort and will consist of two parts: (1) theoretical analysis and design of algorithms and heuristics for view design, and (2) implementation and experiments on large databases, to evaluate the performance improvements caused by using the views.

The research plan for the project is as follows:

 

1.           By August 2004: to characterize constructively optimal viewsets for the most common and important types of queries. The main characteristic is the cost of computing the input query workload using an optimal viewset on the input database.

2.           By February 2005: to delineate infeasibility areas, that is, those query types that present uncommon challenges in designing optimal viewsets; and to characterize constructively optimal viewsets for those common and important types of queries, in the feasibility area, that have not been covered in Goal 1.

3.           By August 2005: to design algorithms, with one algorithm for each of the most common and important query types in the feasibility area. Given appropriate problem inputs, each algorithm will produce a viewset that is optimal, in the expected sense, for the given input.

4.           By February 2006: to design efficient heuristic algorithms to approximate optimal viewsets for common and important query types in the feasibility area; to devise recommendations on designing potentially useful views for the areas of infeasibility.

5.           By August 2006: testing and evaluation: to obtain performance characterizations of the viewsets produced by the approximation algorithms, see Goal 4, and to compare the characterizations with (1) the performance characterizations of the optimal viewsets, and (2) the performance improvements given, on the same inputs, by the well-known view-selection algorithms described in the literature. The expected outcome of this stage will comprise accuracy, performance, and scalability characteristics of the algorithms. We expect our algorithms to give significant improvements over the performance gains given by the algorithms described in the literature.

 

Area Background:

In a number of applications of modern databases, many users pose complex queries on the data at the same time. Processing numerous complex queries simultaneously and efficiently is a nontrivial task for a database-management system; as a result, some or most users may experience slower-than-desired response to their queries. At the same time, if the database system receives some queries over and over again, it is typically possible to significantly improve their response time, by precomputing and storing in the database auxiliary data, called views, and by using the views in the computation of the queries. For instance, in a relational database where all data is stored in tables, a precomputed view becomes just another stored table, which may be used, alongside the original stored data, to answer queries on the database.

The best way to improve query performance is, of course, to precompute and store the answers to all frequent and important queries. Unfortunately, this trivial and optimal solution is often unacceptable, because the new stored data has to satisfy pre-existing constraints on the database system. A common constraint is the amount of disk space allocated for storing the new data; large answers to complex queries typically do not satisfy that constraint.

We explore the approach of finding the “right” (optimal) views, that is, views that satisfy the given database constraints and reduce, as much as possible, the response time for most or all frequent and important database queries. This approach has a potential to lead to dramatic improvements in the efficiency of user interactions with many types of data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. Moreover, our research program offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance.

 

Project Website:

http://research.csc.ncsu.edu/selftune/
(under construction)