Project Reporting ANNUAL REPORT FOR AWARD # 0307072

Rada Y Chirkova ; North Carolina State U
Efficient View-Design Algorithms to Achieve Near-Optimal Performance of Sets of Relational Queries

Participant Individuals:
Graduate student(s) : Shalu Gupta; Zohreh Asgharzadeh Talebi; Gang Gou; Maxim Kormilitsin

Partner Organizations:

Other collaborators:

1. Professor Foto Afrati at the National Technical University of
Athens, Greece; she has collaborated with the PI on four research
papers for this project. Three of the papers have been accepted to
conferences (International Conference on Database Theory ICDT-05; 10th
International Conference on Database Systems for Advanced Applications
DASFAA-2005; Symposium on Abstraction, Reformulation and Approximation
SARA-2005). The fourth paper, based on the accepted ICDT-05 paper, is
in preparation for submission to a journal.

2. Professor Yahya Fathi in the Operations Research program at at NC
State University; he has collaborated with the PI on a paper accepted
to a conference (Ninth East-European Conference on Advances in
Databases and Information Systems ADBIS-2005).

3. Professor Michael R. Genesereth at Stanford University; he has
collaborated with the PI on a paper accepted to a conference (the
Logic and Computational Complexity Workshop LCC-05, in conjunction
with LICS-05).

4. Professor Anupam Joshi at the University of Maryland Baltimore
County (UMBC). Dr. Joshi has collaborated with the PI on a book
chapter ``Data Management for Mobile Ad-Hoc Networks' in ``Enabling
Technologies for Wireless e-Business Applications' (edited by W. Kou
and Y. Yesha, Springer, July 2005).

5. Kyoung-hwa Kim, the PI's M.Sc. student at NC State University; she
has collaborated with the PI by working on a PostgreSQL-based
(http://www.postgresql.org) implementation for the project. Kyoung-hwa
has also contributed to a paper accepted to a conference
(International Advanced Database Conference IADC-2005).

6. Professor Manolis Gergatsoulis at the Ionian University of Corfu,
Greece; he has collaborated with the PI on a paper accepted to a
conference (Symposium on Abstraction, Reformulation and Approximation
SARA-2005). 

7. Professor Chen Li at the University of California, Irvine; he has
collaborated with the PI on a paper accepted to The VLDB Journal in
2005.

8. Jia Li, doctoral student at the University of California, Irvine;
she has contributed to a paper accepted to The VLDB Journal in 2005.

9. Jingni Li, M.Sc. student at NC State University; she has
contributed to a paper accepted to a conference (Ninth East-European
Conference on Advances in Databases and Information Systems
ADBIS-2005).

10. Charles Loftis is a graduate student at NC State University; he
has contributed to a paper accepted to a conference (10th
International Conference on Database Systems for Advanced Applications
DASFAA-2005).

11. Vassia Pavlaki, doctoral student  at the National Technical
University of Athens, Greece; she has contributed to a paper accepted
to a conference (Symposium on Abstraction, Reformulation and
Approximation SARA-2005). 

12. Filip Perich, Cougar Software, Inc; Filip has a Ph.D. from the
University of Maryland Baltimore County (UMBC). While he was a student
at UMBC, he collaborated with the PI on a book chapter ``Data
Management for Mobile Ad-Hoc Networks' in ``Enabling Technologies for
Wireless e-Business Applications' (edited by W. Kou and Y. Yesha,
Springer, July 2005).

Activities and findings:

Research and Education Activities: 
Activities: [Note: NCSU stands for ``North Carolina State University.'] A1. The PI has been working with her NCSU M.Sc. students Shalu Gupta (female) and Kyoung-hwa Kim (female) on a PostgreSQL-based (http://www.postgresql.org) implementation for the project. The latest version of the code (as of 04/01/2005) and an accompanying technical report are available for the public at http://research.csc.ncsu.edu/selftune/. The goals of the project are to build a framework for generation and rewriting of derived data in an open-source database-management system PostgreSQL (http://www.postgresql.org), and to use the framework for experimental validation of research projects in my group, including the NSF project 0307072. As part of the coding project, the PI and Kyoung-hwa Kim have been studying the precision/efficiency tradeoff in view-size estimation in self-organizing databases. An outcome of the work is their joint paper (titled ``View-size estimation in self-organizing databases') accepted to the International Advanced Database Conference (IADC), San Diego, CA, June 2005. A2. The PI has been working with Dr. Foto Afrati (NTUA Greece) on three research projects within the scope of this NSF project. In the first joint project they study how views with or without aggregation can be (i) used in answering queries with aggregation, or (ii) designed to efficiently evaluate queries with aggregation. This project has resulted in a publication in the Tenth International Conference on Database Theory (ICDT) in January 2005; Dr. Afrati and the PI are currently working on a journal extension of that publication. The second joint project (Dr. Afrati and the PI) concerns efficient design and use of views with aggregation in star-schema data warehouses. An outcome of this project is a publication, jointly with NCSU graduate students Shalu Gupta and Charles Loftis, in the Tenth International Conference on Database Systems for Advanced Applications (DASFAA) in April 2005. Dr. Afrati, Shalu Gupta, and the PI are currently preparing a journal version of the paper. The goal of the third joint project (Dr. Afrati and the PI) is to study view design for efficiently answering queries without aggregation under bag and bag-set semantics; this is a joint project with Dr. Manolis Gergatsoulis (Ionian University of Corfu, Greece) and with Dr. Afrati's doctoral student Vassia Pavlaki (NTUA Greece). This project has resulted in a publication in the Symposium on Abstraction, Reformulation and Approximation (SARA) in July 2005. The PI is currently working with her collaborators on extending the findings to view design under bag and bag-set semantics for queries with aggregation. The PI is also setting up a fourth joint project with Dr. Afrati, on designing indexes for query-evaluation efficiency. This project will be discussed by Dr. Afrati and the PI during the PI's visit to Stanford University in June 2005. A3. The PI has been working with Dr. Yahya Fathi and doctoral student Zohreh Asgharzadeh Talebi (female), both in Operations Research NCSU, on developing LP and IP models for designing views for queries with aggregation. The importance of this work is that its results would provide globally optimal solutions to the view-selection problem. One outcome of this project so far has been a publication (jointly with Jingni Li, female, M.Sc. student in Operations Research NCSU) in the Ninth East-European Conference on Advances in Databases and Information Systems (ADBIS), in September 2005. Dr. Fathi, Zohreh, and the PI are currently working on a journal extension of this project. It is envisioned that Zohreh will be supported in her Ph.D. work by this NSF grant; the focus of the work will be on globally optimal design of indexes alongside views for aggregate queries. A4. The PI has worked with Dr. Michael R. Genesereth (the PI's Ph.D. advisor, Stanford University) on view design under set semantics and integrity constraints. The findings have been accepted to the Logic and Computational Complexity Workshop (LCC, in conjunction with LICS), in June 2005. Dr. Genesereth and the PI are currently working on an extension of these results to the contexts of (a) bag and bag-set semantics and (b) aggregate queries, and are planning to work on a journal version of their publication. A5. The PI has published a journal paper (in The VLDB Journal) with Dr. Chen Li and doctoral student Jia Li, both at the University of California, Irvine. This work is an extension of their previous results published in PODS-2003; please see last year's yearly report of the PI on this NSF project. A6. The PI has published a book chapter with Filip Perich and Dr. Anupam Joshi at the University of Maryland Baltimore County (UMBC). The book chapter has a title ``Data Management for Mobile Ad-Hoc Networks' and will appear in ``Enabling Technologies for Wireless e-Business Applications' (edited by W. Kou and Y. Yesha, Springer) in July 2005. A7. The PI has worked with NCSU Ph.D. students Gang Gou and Maxim Kormilitsin on extensions of this NSF project to the PI's CAREER project. One extension (with Gang Gou) is a project on efficiently answering queries on XML and RDF data using indexes and on efficiently answering queries on XML and RDF data using views. This work is a necessary preliminary step in investigating view and index design for efficient evaluation of queries in these data models. The second extension (with Maxim Kormilitsin) is developing approximation algorithms for designing views under a storage limit for efficient evaluation of select-project-join queries without aggregation. B. In addition to working with three NCSU Ph.D. students (see A3 and A7 above) Gang Gou, Maxim Kormilitsin, and Zohreh Asgharzadeh Talebi, I have been training an NCSU M.Sc. student Andrew Frick to work on the project (see project on designing indexes for query-evaluation efficiency, under A2 above). I am considering working with all of these students using the NSF project funds. (Some of the students do not need financial support all the time, as they are either partially self-supporting or have TA funds that partially support them.) C. This summer semester 2005, I am working with an NCSU undergraduate student Nathaniel Derbinsky on the project on designing indexes for query-evaluation efficiency, see under A2 above. D. In the spring semester 2005, I taught two courses, NCSU CSC (computer science) 440 (intro database systems for undergraduate students) and 742 (advanced topics on database systems for graduate students). Parts of the courses were devoted (especially in CSC 742) to discussing the research directions of this NSF proposal. To increase novelty and excitement in the curriculum and to give students a research experience that could motivate them to go to graduate school and to academia after they graduate, I was acting as an expert-in-residence on research in the project.

Findings:
Findings on the projects A1 through A7, under Activities above (projects B through D are of educational value and thus do not involve research findings): A1. We are using the PostgreSQL-based (http://www.postgresql.org) implementation to validate the findings in the project, including those for the individual projects A2, A7, B, and C. In addition, we have used the implementation to discover and study guidelines on using the precision/efficiency tradeoff in view-size estimation in self-organizing databases. An outcome of the work is a paper ``View-size estimation in self-organizing databases' accepted to the International Advanced Database Conference (IADC), San Diego, CA, June 2005. The latest version of the code (as of 04/01/2005) and an accompanying technical report are available for the public at http://research.csc.ncsu.edu/selftune/. A2. In the context of obtaining equivalent rewritings of aggregate queries using views, we have considered the problem of minimizing the cost of computing a query workload, and looked at query rewriting using existing views and at view selection. In the query-rewriting problem, we have discovered sufficient and necessary conditions for a rewriting to exist. For view selection, we have proved complexity results. We have also developed algorithms for obtaining rewritings and selecting views. See publication ``Selecting and Using Views to Compute Aggregate Queries' in the Tenth International Conference on Database Theory (ICDT) in January 2005. In the context of efficient design and use of views with aggregation in star-schema data warehouses, we have developed an extensible system architecture for Query-Performance Enhancement by Tuning (QPET). QPET combines design and use of derived data in an end-to-end approach to automated query-performance tuning, and selects appropriate data-design algorithms depending on the characteristics of the prevalent queries. Our focus in automated query-performance tuning is on a tradeoff between the amount of system resources spent on designing derived data and the degree of the resulting improvement in query performance. We have developed algorithms and reported experimental results in designing and using materialized views for practically important classes of aggregate queries, including range-aggregate queries on star-schema data warehouses. See publication ``Designing and Using Views to Improve Performance of Aggregate Queries' in the Tenth International Conference on Database Systems for Advanced Applications (DASFAA) in April 2005. In the project on studying view design for efficiently answering queries without aggregation, we have obtained complexity results on selecting views to answer queries in relational databases under set, bag, and bag-set semantics. The results can be used under each of the three assumptions, to find sound and complete algorithms for designing views and rewriting queries efficiently. See publication ``Designing Views to Efficiently Answer Real SQL Queries' in the Symposium on Abstraction, Reformulation and Approximation (SARA) in July 2005. A3. In the project on obtaining optimal solutions for view design, we have developed an integer-programming model to obtain optimal solutions to the problem of view selection for aggregate queries on data warehouses. We have also performed post-optimality analysis to determine/observe the impact of changing certain input characteristics on the optimal solution. The importance of this work is that its results would provide globally optimal solutions to the view-selection problem. See publication ``A Formal Model for the Problem of View Selection for Aggregate Queries' in the Ninth East-European Conference on Advances in Databases and Information Systems (ADBIS), in September 2005. A4. In the project on view design in presence of integrity constraints, we have obtained complexity results and algorithms for database reformulation for conjunctive queries and for several types of constraints, including functional and inclusion dependencies. To obtain better complexity results, we have introduced an unchase technique, which reduces the problem of query equivalence under constraints to equivalence in the absence of constraints without increasing query size. See publication ``Database Reformulation with Integrity Constraints' in the Logic and Computational Complexity Workshop (LCC, in conjunction with the Logic in Computer Science Conference - LICS), in June 2005. A5 through A7: Work in progress

Training and Development:
Research and teaching skills and experience the project helped provide: I have been working with four female students at NCSU in the context of the project: M.Sc. students Shalu Gupta and Kyoung-hwa Kim (see Activities A1 and A2), and M.Sc. student Jingni Li and Ph.D. student Zohreh Asgharzadeh Talebi (see Activity A3). The project has contributed to education of female NCSU students and of other NCSU students (Ph.D. students Gang Gou and Maxim Kormilitsin, M.Sc. student Andrew Frick, graduate student Charles Loftis, undergraduate student Nathaniel Derbinsky, see Activities A7, B, and C), by giving them unique research and implementation skills in order to give them a competitive edge in the workforce. One outcome of this project so far are M.Sc. theses for NCSU CSC students Kyoung-hwa Kim (graduated with a M.Sc. degree in January 2005) and Shalu Gupta (thesis defense planned for July 2005). In addition, this grant provides financial support for NCSU doctoral student Zohreh Asgharzadeh Talebi. In addition, the project has contributed to increasing novelty and excitement in the curriculum and to give students a research experience that could motivate them to go to graduate school and to academia after they graduate, see Activity D.

Outreach Activities:
I have made all project reports and an implementation version and technical report available to the general public on the Internet, at http://research.csc.ncsu.edu/selftune/

Journal Publications:
Rada Chirkova, Chen Li, and Jia Li, "Answering Queries Using Materialized Views with Minimum Size", The VLDB Journal, vol. 14, (2005), p. . Accepted
Filip Perich, Anupam Joshi, and Rada Chirkova, "Data Management for Mobile Ad-Hoc Networks", In book "Enabling Technologies for Wireless e-Business Applications", W. Kou and Y. Yesha, eds., Springer, vol. , (2005), p. . book chapter
Jingni Li, Zohreh Asgharzadeh Talebi, Rada Chirkova, and Yahya Fathi , "A Formal Model for the Problem of View Selection for Aggregate Queries ", Proceedings of the Ninth East-European Conference on Advances in Databases and Information Systems (ADBIS), Tallinn, Estonia, September 2005. , vol. , (2005), p. . Accepted
Rada Chirkova and Michael R. Genesereth , "Database Reformulation with Integrity Constraints ", Proceedings of the Logic and Computational Complexity Workshop, in conjunction with of the Logic in Computer Science Conference (LICS), Chicago, June 2005. , vol. , (2005), p. . workshop
Foto Afrati, Rada Chirkova, Manolis Gergatsoulis, and Vassia Pavlaki , "Designing Views to Efficiently Answer Real SQL Queries ", Proceedings of the Symposium on Abstraction, Reformulation and Approximation, Edinburgh, Scotland, July 2005. , vol. , (2005), p. . symposium
Kyoung-hwa Kim and Rada Chirkova , "View-Size Estimation in Self-Organizing Databases ", Proceedings of an International Advanced Database Conference (IADC), San Diego, CA, June 2005. , vol. , (2005), p. . conference
Foto Afrati, Rada Chirkova, Shalu Gupta, and Charles Loftis , "Designing and Using Views to Improve Performance of Aggregate Queries ", Proceedings of the Tenth International Conference on Database Systems for Advanced Applications, April 2005, Beijing, China. , vol. , (2005), p. . conference
Foto Afrati and Rada Chirkova , "Selecting and Using Views to Compute Aggregate Queries ", Proceedings of the Tenth International Conference on Database Theory (ICDT-2005), Edinburgh, Scotland, January 2005. , vol. , (2005), p. . conference

Book(s) of other one-time publications(s):

Other Specific Products:

Software (or netware)
This software a is a database-management system that includes a
framework for generation and rewriting of derived data. The software
has been built in an open-source database-management system PostgreSQL
(http://www.postgresql.org). The purpose of the software is
experimental validation of research projects in my group, including
the NSF project 0307072.
On the Internet via http://research.csc.ncsu.edu/selftune/

Internet Dissemination:

http://research.csc.ncsu.edu/selftune/

This site has up-to-date information on the progress of the project,
including an overview of the project
(http://research.csc.ncsu.edu/selftune/manifestoFall2003.pdf), NSF
annual reports
(http://research.csc.ncsu.edu/selftune/reportAugust2003.html),
technical reports and software to download.

Contributions:

Contributions within Discipline:

 Findings on the problem of designing views to reduce the evaluation
costs of database queries contribute to the body of knowledge on
designing and using derived data to answer queries in data-intensive
systems. Please see the details under ``Findings' in this report.


Contributions to Education and Human Resources:
 Yes; please see under ``Findings' in this project. Briefly, work on
the project is contributing to education of diverse skilled workforce,
within the context of the mission of NCSU.

Special Requirements for Annual Project Report:

Unobligated Funds: $12120.00
Categories for which nothing is reported:
Participants: Partner organizations
Products: Book or other one-time publication
Contributions to Other Disciplines
Contributions to Resources for Science and Technology
Contributions Beyond Science and Engineering
Special Reporting Requirements
Animal, Human Subjects, Biohazards


FastLane Home Page Take you to the Project System Control Screen We welcome comments on this system

If you have trouble accessing any FastLane page, please contact the FastLane Help Desk at 1-800-673-6188