EDBT/ICDT 2009 Joint Conference

Electronic Conference Proceedings

Data Integration Flows for Business Intelligence

Authors

Abstract

Business Intelligence (BI) refers to technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making. Today’s BI architecture typically consists of a data warehouse (or one or more data marts), which consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. The back-end of the architecture is a data integration pipeline for populating the data warehouse by extracting data from distributed and usually heterogeneous operational sources; cleansing, integrating and transforming the data; and loading it into the data warehouse. Since BI systems have been used primarily for off-line, strategic decision making, the traditional data integration pipeline is a one-way, batch process, usually implemented by extracttransform- load (ETL) tools. The design and implementation of the ETL pipeline is largely a labor-intensive activity, and typically consumes a large fraction of the effort in data warehousing projects. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. This imposes additional requirements and tradeoffs, resulting in even more complexity in the design of data integration flows. These include reducing the latency so that near real-time data can be delivered to the data warehouse, extracting information from a wider variety of data sources, extending the rigidly serial ETL pipeline to more general data flows, and considering alternative physical implementations. We describe the requirements for data integration flows in this next generation of operational BI system, the limitations of current technologies, the research challenges in meeting these requirements, and a framework for addressing these challenges. The goal is to facilitate the design and implementation of optimal flows to meet business requirements.

About the Speaker

Umeshwar Dayal (Hewlett-Packard Labs, USA)
Umeshwar Dayal

Umeshwar Dayal is an HP Fellow in the Intelligent Information Management Lab at Hewlett-Packard Laboratories, Palo Alto, California. In this role, he has initiated research programs in enterprise-scale data warehousing, scalable analytics, and information visualization.

Umesh has nearly 30 years of research experience in data management and has made fundamental contributions in the field, including developing some of the basic techniques for managing federated databases, defining mechanisms for triggering transactions and database rules, and investigating query optimization strategies, especially in heterogeneous systems. In addition, he has done important work in long-duration transactions, business process management, and database design. He has published over 160 research papers and holds over 25 patents in the areas of database systems, transaction management, business process management, business intelligence and information visualization. In 2001, he received (with two co-authors) the prestigious 10-year best paper award from the International Conference on Very Large Data Bases for his paper on a transactional model for long-running activities.

Prior to joining HP Labs, Umesh was a senior researcher at DEC's Cambridge Research Lab, Chief Scientist at Xerox Advanced Information Technology and Computer Corporation of America, and on the faculty at the University of Texas-Austin. He received his PhD from Harvard University.

Umesh has served on the Editorial Board of four international journals, and he has chaired and served on the Program Committees of numerous conferences. Most recently, he was General Program Chair of VLDB 2006. He has served as a member of the Board of the VLDB Endowment, the Executive Committee of the IEEE Technical Committee on Electronic Commerce, and the Steering Committee of the SIAM International Conference on Data Mining; and as a founding member of the Board of the International Foundation for Cooperative Information Systems.

Session

EDBT Invited Talk: Data Integration Flows for Business Intelligence (Tuesday, March 24, 09:00—10:30)