Efficient View-Design Algorithms for Near-Optimal Query Performance

                                                            Project Award Number: IIS-0307072


                           
Principal Investigator: Rada Y. Chirkova           Main collaborator: Foto Afrati

                                  Department of Computer Science                           Computer Science Division

                                    North Carolina State University                    National Technical University of Athens

                                            Campus Box 7535                                  9, Heroon Politechniou, Zographou

                                      Raleigh, NC 27695-7535                                       157 73 Athens, Greece

                                       Phone: (919) 513-3506                                        Phone: +30-1-7722498

                                         Fax: (919) 513-4357                                            Fax: +30-1-7722459

                                  Email: chirkova@csc.ncsu.edu                               Email: afrati@cs.ece.ntua.gr

                                http://www4.ncsu.edu/~rychirko/             http://www.softlab.ntua.gr/facilities/public/AD/foto/

 

Keywords: query optimization, materialized views, query processing.

 

Project Summary:

The goal of this project is to develop new effective methods to improve the performance of sets of frequent and important queries on large relational databases, which could improve the efficiency of user interactions with data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration. The project focuses on the methodology of evaluating queries using views; views are relations that are defined by auxiliary queries and can be used to rewrite and answer user queries. One way to improve query performance is precompute and store (i.e., materialize) views.

To truly optimize query performance, it is critical to materialize the "right" views. The project will demonstrate that, by designing and materializing views, it is possible to ensure optimal or near-optimal performance of frequent and important queries, for common and important query types. The focus of this effort is on developing efficient and scalable heuristic algorithms that design (near-) optimal sets of views for the given queries. The project has two parts: (1) theoretical analysis and design of algorithms and heuristics for view design, and (2) implementation and experiments on large databases, to evaluate the performance improvements caused by using the views.

 

Publications and Products:

·              Foto Afrati, Rada Chirkova, Shalu Gupta, and Charles Loftis. Designing and Using Views to Improve Performance of Aggregate Queries. In review, submitted September 2004.

·              Rada Chirkova, Shalu Gupta, Kyoung-Hwa Kim, and Simran Sandhu. Extensible Framework for Query-Perofrmance Enhancement by Tuning. Code downloads and documentation available from http://research.csc.ncsu.edu/selftune/, September 2004.

·              Foto Afrati, Rada Chirkova, Shalu Gupta, and Charles Loftis. Designing and Using Views to Improve Performance of Aggregate Queries, Technical Report NCSU CSC TR-2004-26, September 2004.

·              Foto Afrati, Rada Chirkova, Manolis Gergatsoulis, and Vassia Pavlaki. Designing Views to Answer Queries Under Set, Bag, and Bag-Set Semantics. In review, submitted September 2004.

·              Foto Afrati and Rada Chirkova. Selecting and Using Views to Compute Aggregate Queries. In review, submitted June 2004.

·              Rada Chirkova, Chen Li, and Jia Li. Materializing Views with Minimum Size to Answer Queries. Technical Report NCSU-UC Irvine, May 2004.

·              Rada Chirkova, Chen Li, and Jia Li. Answering Queries Using Materialized Views with Minimum Size. In review, submitted May 2004.

·              Rada Chirkova and Chen Li. Materializing Views with Minimal Size to Answer Queries. Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2003).

Project Impact:

This project offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance costs (GPRA Performance Plan Outcome Goal 1). The project has been providing educational and research experience opportunities for graduate and undegraduate students (GPRA Performance Plan Outcome Goals 3 and 4). We are planning to collaborate with owners and users of large databases, to improve the efficiency of the queries the users ask on their data, while at the same time testing the techniques proposed in the project (GPRA Performance Plan Outcome Goal 2). The techniques resulting from this project could have application in commercial and experimental database systems, where they will provide new ways to lower query-processing costs (GPRA Performance Plan Outcome Goal 2). The research results are accessible via the project web site http://research.csc.ncsu.edu/selftune/, publications, and freely disseminated software.

 

Goals, Objectives and Targeted Activities:

The project is envisioned as a three-year effort and will consist of two parts: (1) theoretical analysis and design of algorithms and heuristics for view design, and (2) implementation and experiments on large databases, to evaluate the performance improvements caused by using the views.

The research plan for the project is as follows:

 

1.   By August 2004 (done): to characterize constructively optimal viewsets for the most common and important types of queries. The main characteristic is the cost of computing the input query workload using an optimal viewset on the input database.

2.   By February 2005: to delineate infeasibility areas, that is, those query types that present uncommon challenges in designing optimal viewsets; and to characterize constructively optimal viewsets for those common and important types of queries, in the feasibility area, that have not been covered in Goal 1.

3.   By August 2005: to design algorithms, with one algorithm for each of the most common and important query types in the feasibility area. Given appropriate problem inputs, each algorithm will produce a viewset that is optimal, in the expected sense, for the given input.

4.   By February 2006: to design efficient heuristic algorithms to approximate optimal viewsets for common and important query types in the feasibility area; to devise recommendations on designing potentially useful views for the areas of infeasibility.

5.   By August 2006: testing and evaluation: to obtain performance characterizations of the viewsets produced by the approximation algorithms, see Goal 4, and to compare the characterizations with (1) the performance characterizations of the optimal viewsets, and (2) the performance improvements given, on the same inputs, by the well-known view-selection algorithms described in the literature. The expected outcome of this stage will comprise accuracy, performance, and scalability characteristics of the algorithms. We expect our algorithms to give significant improvements over the performance gains given by the algorithms described in the literature.

 

Area Background:

In a number of applications of modern databases, many users pose complex queries on the data at the same time. Processing numerous complex queries simultaneously and efficiently is a nontrivial task for a database-management system; as a result, some or most users may experience slower-than-desired response to their queries. At the same time, if the database system receives some queries over and over again, it is typically possible to significantly improve their response time, by precomputing and storing in the database auxiliary data, called views, and by using the views in the computation of the queries. For instance, in a relational database where all data is stored in tables, a precomputed view becomes just another stored table, which may be used, alongside the original stored data, to answer queries on the database.

The best way to improve query performance is, of course, to precompute and store the answers to all frequent and important queries. Unfortunately, this trivial and optimal solution is often unacceptable, because the new stored data has to satisfy pre-existing constraints on the database system. A common constraint is the amount of disk space allocated for storing the new data; large answers to complex queries typically do not satisfy that constraint.

We explore the approach of finding the “right” (optimal) views, that is, views that satisfy the given database constraints and reduce, as much as possible, the response time for most or all frequent and important database queries. This approach has a potential to lead to dramatic improvements in the efficiency of user interactions with many types of data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. Moreover, our research program offers a unique test case for a fundamental understanding of optimality in answering queries and, more generally, in database performance.

 

 Area References:

·              Rada Chirkova, Alon Y. Halevy, and Dan Suciu. A Formal Perspective on the View Selection Problem. The VLDB Journal (invited paper), 11(3):216-237, 2002.

·              Rada Chirkova. The View-Selection Problem Has an Exponential-Time Lower Bound for Conjunctive Queries and Views. Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS-2002).

·              Rada Chirkova and Michael R. Genesereth. Linearly Bounded Reformulations of Conjunctive Databases. Proceedings of the First International Conference on Computational Logic, July 2000.

Potential Related Projects:

We welcome contacts from other researchers interested in collaboration.

 

Project Websites:

http://research.csc.ncsu.edu/selftune/. This site has up-to-date information on the progress of the project, including an overview of the project (http://research.csc.ncsu.edu/selftune/manifestoFall2003.pdf), NSF annual reports (http://research.csc.ncsu.edu/selftune/reportAugust2003.html), technical reports and software to download.

 

Online Software:

http://research.csc.ncsu.edu/selftune/. This software a is a database-management system that includes a
framework for generation and rewriting of derived data. The software has been built in an open-source database-management system PostgreSQL (http://www.postgresql.org). The purpose of the software is experimental validation of this and other research projects.