Partner Organizations:
Other collaborators:
1. Professor Foto Afrati at the National Technical University of Athens, Greece; she has collaborated with the PI on four research papers for this project. Three of the papers have been accepted to conferences (International Conference on Database Theory ICDT-05; 10th International Conference on Database Systems for Advanced Applications DASFAA-2005; Symposium on Abstraction, Reformulation and Approximation SARA-2005). The fourth paper, based on the accepted ICDT-05 paper, is in preparation for submission to a journal. 2. Professor Yahya Fathi in the Operations Research program at at NC State University; he has collaborated with the PI on a paper accepted to a conference (Ninth East-European Conference on Advances in Databases and Information Systems ADBIS-2005). 3. Professor Michael R. Genesereth at Stanford University; he has collaborated with the PI on a paper accepted to a conference (the Logic and Computational Complexity Workshop LCC-05, in conjunction with LICS-05). 4. Professor Anupam Joshi at the University of Maryland Baltimore County (UMBC). Dr. Joshi has collaborated with the PI on a book chapter ``Data Management for Mobile Ad-Hoc Networks' in ``Enabling Technologies for Wireless e-Business Applications' (edited by W. Kou and Y. Yesha, Springer, July 2005). 5. Kyoung-hwa Kim, the PI's M.Sc. student at NC State University; she has collaborated with the PI by working on a PostgreSQL-based (http://www.postgresql.org) implementation for the project. Kyoung-hwa has also contributed to a paper accepted to a conference (International Advanced Database Conference IADC-2005). 6. Professor Manolis Gergatsoulis at the Ionian University of Corfu, Greece; he has collaborated with the PI on a paper accepted to a conference (Symposium on Abstraction, Reformulation and Approximation SARA-2005). 7. Professor Chen Li at the University of California, Irvine; he has collaborated with the PI on a paper accepted to The VLDB Journal in 2005. 8. Jia Li, doctoral student at the University of California, Irvine; she has contributed to a paper accepted to The VLDB Journal in 2005. 9. Jingni Li, M.Sc. student at NC State University; she has contributed to a paper accepted to a conference (Ninth East-European Conference on Advances in Databases and Information Systems ADBIS-2005). 10. Charles Loftis is a graduate student at NC State University; he has contributed to a paper accepted to a conference (10th International Conference on Database Systems for Advanced Applications DASFAA-2005). 11. Vassia Pavlaki, doctoral student at the National Technical University of Athens, Greece; she has contributed to a paper accepted to a conference (Symposium on Abstraction, Reformulation and Approximation SARA-2005). 12. Filip Perich, Cougar Software, Inc; Filip has a Ph.D. from the University of Maryland Baltimore County (UMBC). While he was a student at UMBC, he collaborated with the PI on a book chapter ``Data Management for Mobile Ad-Hoc Networks' in ``Enabling Technologies for Wireless e-Business Applications' (edited by W. Kou and Y. Yesha, Springer, July 2005).
Activities and findings:
Research and Education Activities: Activities: [Note: NCSU stands for ``North Carolina State University.'] A1. The PI has been working with her NCSU M.Sc. students Shalu Gupta (female) and Kyoung-hwa Kim (female) on a PostgreSQL-based (http://www.postgresql.org) implementation for the project. The latest version of the code (as of 04/01/2005) and an accompanying technical report are available for the public at http://research.csc.ncsu.edu/selftune/. The goals of the project are to build a framework for generation and rewriting of derived data in an open-source database-management system PostgreSQL (http://www.postgresql.org), and to use the framework for experimental validation of research projects in my group, including the NSF project 0307072. As part of the coding project, the PI and Kyoung-hwa Kim have been studying the precision/efficiency tradeoff in view-size estimation in self-organizing databases. An outcome of the work is their joint paper (titled ``View-size estimation in self-organizing databases') accepted to the International Advanced Database Conference (IADC), San Diego, CA, June 2005. A2. The PI has been working with Dr. Foto Afrati (NTUA Greece) on three research projects within the scope of this NSF project. In the first joint project they study how views with or without aggregation can be (i) used in answering queries with aggregation, or (ii) designed to efficiently evaluate queries with aggregation. This project has resulted in a publication in the Tenth International Conference on Database Theory (ICDT) in January 2005; Dr. Afrati and the PI are currently working on a journal extension of that publication. The second joint project (Dr. Afrati and the PI) concerns efficient design and use of views with aggregation in star-schema data warehouses. An outcome of this project is a publication, jointly with NCSU graduate students Shalu Gupta and Charles Loftis, in the Tenth International Conference on Database Systems for Advanced Applications (DASFAA) in April 2005. Dr. Afrati, Shalu Gupta, and the PI are currently preparing a journal version of the paper. The goal of the third joint project (Dr. Afrati and the PI) is to study view design for efficiently answering queries without aggregation under bag and bag-set semantics; this is a joint project with Dr. Manolis Gergatsoulis (Ionian University of Corfu, Greece) and with Dr. Afrati's doctoral student Vassia Pavlaki (NTUA Greece). This project has resulted in a publication in the Symposium on Abstraction, Reformulation and Approximation (SARA) in July 2005. The PI is currently working with her collaborators on extending the findings to view design under bag and bag-set semantics for queries with aggregation. The PI is also setting up a fourth joint project with Dr. Afrati, on designing indexes for query-evaluation efficiency. This project will be discussed by Dr. Afrati and the PI during the PI's visit to Stanford University in June 2005. A3. The PI has been working with Dr. Yahya Fathi and doctoral student Zohreh Asgharzadeh Talebi (female), both in Operations Research NCSU, on developing LP and IP models for designing views for queries with aggregation. The importance of this work is that its results would provide globally optimal solutions to the view-selection problem. One outcome of this project so far has been a publication (jointly with Jingni Li, female, M.Sc. student in Operations Research NCSU) in the Ninth East-European Conference on Advances in Databases and Information Systems (ADBIS), in September 2005. Dr. Fathi, Zohreh, and the PI are currently working on a journal extension of this project. It is envisioned that Zohreh will be supported in her Ph.D. work by this NSF grant; the focus of the work will be on globally optimal design of indexes alongside views for aggregate queries. A4. The PI has worked with Dr. Michael R. Genesereth (the PI's Ph.D. advisor, Stanford University) on view design under set semantics and integrity constraints. The findings have been accepted to the Logic and Computational Complexity Workshop (LCC, in conjunction with LICS), in June 2005. Dr. Genesereth and the PI are currently working on an extension of these results to the contexts of (a) bag and bag-set semantics and (b) aggregate queries, and are planning to work on a journal version of their publication. A5. The PI has published a journal paper (in The VLDB Journal) with Dr. Chen Li and doctoral student Jia Li, both at the University of California, Irvine. This work is an extension of their previous results published in PODS-2003; please see last year's yearly report of the PI on this NSF project. A6. The PI has published a book chapter with Filip Perich and Dr. Anupam Joshi at the University of Maryland Baltimore County (UMBC). The book chapter has a title ``Data Management for Mobile Ad-Hoc Networks' and will appear in ``Enabling Technologies for Wireless e-Business Applications' (edited by W. Kou and Y. Yesha, Springer) in July 2005. A7. The PI has worked with NCSU Ph.D. students Gang Gou and Maxim Kormilitsin on extensions of this NSF project to the PI's CAREER project. One extension (with Gang Gou) is a project on efficiently answering queries on XML and RDF data using indexes and on efficiently answering queries on XML and RDF data using views. This work is a necessary preliminary step in investigating view and index design for efficient evaluation of queries in these data models. The second extension (with Maxim Kormilitsin) is developing approximation algorithms for designing views under a storage limit for efficient evaluation of select-project-join queries without aggregation. B. In addition to working with three NCSU Ph.D. students (see A3 and A7 above) Gang Gou, Maxim Kormilitsin, and Zohreh Asgharzadeh Talebi, I have been training an NCSU M.Sc. student Andrew Frick to work on the project (see project on designing indexes for query-evaluation efficiency, under A2 above). I am considering working with all of these students using the NSF project funds. (Some of the students do not need financial support all the time, as they are either partially self-supporting or have TA funds that partially support them.) C. This summer semester 2005, I am working with an NCSU undergraduate student Nathaniel Derbinsky on the project on designing indexes for query-evaluation efficiency, see under A2 above. D. In the spring semester 2005, I taught two courses, NCSU CSC (computer science) 440 (intro database systems for undergraduate students) and 742 (advanced topics on database systems for graduate students). Parts of the courses were devoted (especially in CSC 742) to discussing the research directions of this NSF proposal. To increase novelty and excitement in the curriculum and to give students a research experience that could motivate them to go to graduate school and to academia after they graduate, I was acting as an expert-in-residence on research in the project.
Findings: Findings on the projects A1 through A7, under Activities above (projects B through D are of educational value and thus do not involve research findings): A1. We are using the PostgreSQL-based (http://www.postgresql.org) implementation to validate the findings in the project, including those for the individual projects A2, A7, B, and C. In addition, we have used the implementation to discover and study guidelines on using the precision/efficiency tradeoff in view-size estimation in self-organizing databases. An outcome of the work is a paper ``View-size estimation in self-organizing databases' accepted to the International Advanced Database Conference (IADC), San Diego, CA, June 2005. The latest version of the code (as of 04/01/2005) and an accompanying technical report are available for the public at http://research.csc.ncsu.edu/selftune/. A2. In the context of obtaining equivalent rewritings of aggregate queries using views, we have considered the problem of minimizing the cost of computing a query workload, and looked at query rewriting using existing views and at view selection. In the query-rewriting problem, we have discovered sufficient and necessary conditions for a rewriting to exist. For view selection, we have proved complexity results. We have also developed algorithms for obtaining rewritings and selecting views. See publication ``Selecting and Using Views to Compute Aggregate Queries' in the Tenth International Conference on Database Theory (ICDT) in January 2005. In the context of efficient design and use of views with aggregation in star-schema data warehouses, we have developed an extensible system architecture for Query-Performance Enhancement by Tuning (QPET). QPET combines design and use of derived data in an end-to-end approach to automated query-performance tuning, and selects appropriate data-design algorithms depending on the characteristics of the prevalent queries. Our focus in automated query-performance tuning is on a tradeoff between the amount of system resources spent on designing derived data and the degree of the resulting improvement in query performance. We have developed algorithms and reported experimental results in designing and using materialized views for practically important classes of aggregate queries, including range-aggregate queries on star-schema data warehouses. See publication ``Designing and Using Views to Improve Performance of Aggregate Queries' in the Tenth International Conference on Database Systems for Advanced Applications (DASFAA) in April 2005. In the project on studying view design for efficiently answering queries without aggregation, we have obtained complexity results on selecting views to answer queries in relational databases under set, bag, and bag-set semantics. The results can be used under each of the three assumptions, to find sound and complete algorithms for designing views and rewriting queries efficiently. See publication ``Designing Views to Efficiently Answer Real SQL Queries' in the Symposium on Abstraction, Reformulation and Approximation (SARA) in July 2005. A3. In the project on obtaining optimal solutions for view design, we have developed an integer-programming model to obtain optimal solutions to the problem of view selection for aggregate queries on data warehouses. We have also performed post-optimality analysis to determine/observe the impact of changing certain input characteristics on the optimal solution. The importance of this work is that its results would provide globally optimal solutions to the view-selection problem. See publication ``A Formal Model for the Problem of View Selection for Aggregate Queries' in the Ninth East-European Conference on Advances in Databases and Information Systems (ADBIS), in September 2005. A4. In the project on view design in presence of integrity constraints, we have obtained complexity results and algorithms for database reformulation for conjunctive queries and for several types of constraints, including functional and inclusion dependencies. To obtain better complexity results, we have introduced an unchase technique, which reduces the problem of query equivalence under constraints to equivalence in the absence of constraints without increasing query size. See publication ``Database Reformulation with Integrity Constraints' in the Logic and Computational Complexity Workshop (LCC, in conjunction with the Logic in Computer Science Conference - LICS), in June 2005. A5 through A7: Work in progress
Training and Development: Research and teaching skills and experience the project helped provide: I have been working with four female students at NCSU in the context of the project: M.Sc. students Shalu Gupta and Kyoung-hwa Kim (see Activities A1 and A2), and M.Sc. student Jingni Li and Ph.D. student Zohreh Asgharzadeh Talebi (see Activity A3). The project has contributed to education of female NCSU students and of other NCSU students (Ph.D. students Gang Gou and Maxim Kormilitsin, M.Sc. student Andrew Frick, graduate student Charles Loftis, undergraduate student Nathaniel Derbinsky, see Activities A7, B, and C), by giving them unique research and implementation skills in order to give them a competitive edge in the workforce. One outcome of this project so far are M.Sc. theses for NCSU CSC students Kyoung-hwa Kim (graduated with a M.Sc. degree in January 2005) and Shalu Gupta (thesis defense planned for July 2005). In addition, this grant provides financial support for NCSU doctoral student Zohreh Asgharzadeh Talebi. In addition, the project has contributed to increasing novelty and excitement in the curriculum and to give students a research experience that could motivate them to go to graduate school and to academia after they graduate, see Activity D.
Outreach Activities: I have made all project reports and an implementation version and technical report available to the general public on the Internet, at http://research.csc.ncsu.edu/selftune/
Journal Publications:
Other Specific Products:
This software a is a database-management system that includes a framework for generation and rewriting of derived data. The software has been built in an open-source database-management system PostgreSQL (http://www.postgresql.org). The purpose of the software is experimental validation of research projects in my group, including the NSF project 0307072.
On the Internet via http://research.csc.ncsu.edu/selftune/
http://research.csc.ncsu.edu/selftune/
This site has up-to-date information on the progress of the project, including an overview of the project (http://research.csc.ncsu.edu/selftune/manifestoFall2003.pdf), NSF annual reports (http://research.csc.ncsu.edu/selftune/reportAugust2003.html), technical reports and software to download.
Contributions:
Contributions within Discipline:
Findings on the problem of designing views to reduce the evaluation costs of database queries contribute to the body of knowledge on designing and using derived data to answer queries in data-intensive systems. Please see the details under ``Findings' in this report.
Yes; please see under ``Findings' in this project. Briefly, work on the project is contributing to education of diverse skilled workforce, within the context of the mission of NCSU.
Special Requirements for Annual Project Report:
Unobligated Funds: $12120.00 Categories for which nothing is reported: Participants: Partner organizations Products: Book or other one-time publication Contributions to Other Disciplines Contributions to Resources for Science and TechnologyContributions Beyond Science and Engineering Special Reporting Requirements Animal, Human Subjects, Biohazards